Video bp1

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2579307, IEEE
Transactions on Image Processing
IEEE TRANSACTIONS IMAGE PROCESSING, VOL. X, NO. Y, Z 2016
Video Extrapolation Method Based on

Time-Varying Energy Optimization And CIP
Hidetomo Sakaino, Senior Member, IEEE
rigid objects, various new videos are generated by changing the

AbstractVideo extrapolation/prediction methods are often original frame order to different frame orders [18], where
used to synthesize new videos from images. For fluid-like images smooth transition constraints between frames are used. For
and dynamic textures as well as moving rigid objects, most novel videos with fluid-like images or dynamic textures, ap-
state-of-the-art video extrapolation methods use non-physics
plying the statistical approach, a large number of images are
based models that learn orthogonal bases from a number of im-
ages but at high computation cost. Unfortunately, data truncation used to learn linear models by the maximum likelihood ap-
can cause image degradation, i.e., blur, artifact, and insufficient proach [2], [5], where non-physics based equations are used to
motion changes. To extrapolate videos that more strictly follow extrapolate new images. However, due to the large amount of
physical rules, this paper proposes a physics-based method that learning image data needed, data truncation, data smoothing,
needs only a few images and is truncation-free. We utilize phys- and/or data approximation by limiting the number of orthogo-
ics-based equations with image intensity and velocity: optical flow,
nal bases [1][4], i.e., low frequency, are required, where
Navier-Stokes, continuity, and advection equations. These allow
us to use partial difference equations to deal with the local image Principal Component Analysis (PCA) [3] and Singular Value
feature changes. Image degradation during extrapolation is Decomposition (SVD) [4] with very large matrices have been
minimized by updating model parameters, where a novel applied at high computation cost. The resulting videos are
time-varying energy balancer model that uses energy based image likely to have significantly lower image quality than the orig-
features, i.e., texture, velocity, and edge. Moreover, the advection inal due to the loss of the high frequency components, i.e., blur,
equation is discretized by high order Constrained Interpolation
from the original video. Therefore, even with moving rigid
Profile for lower quantization error than can be achieved by the
previous Finite Difference Method in long-term videos. Experi- objects, their original edges and contours can be smoothed by
ments show that the proposed energy based video extrapolation PCA and SVD based video extrapolation methods. Also, since
method outperforms state-of-the-art video extrapolation methods no physical motion models are contained in PCA and SVD,
in terms of image quality and computation cost. randomizing, repeating, and averaged videos can be synthe-
sized, where locally large movement can also be ignored while
Index TermsVideo, image features, optical flow, physics, en- learning the original videos. Therefore, learning and extrapo-
ergy, time-varying, optimization, CIP, image quality. lating videos from a smaller number of images without the loss
of local image features has long been desired. Such local image
I. INTRODUCTION features contain appearance features that must be extracted
VIDEO extrapolation is the synthesis of new image se-

quences from images. This is one of the important methods
from image intensity, i.e., texture and shape, to motion, i.e.,
velocity/optical flow. It is assumed that image intensity and
in generating desired images from limited image data sources. motion of objects in videos will exhibit variation due to their
It offers a very wide range of applications in image processing, different properties such as rigidity, elasticity, and fluidity. In
computer vision, and multimedia for image completion, dif- particular, natural scenes and objects exhibit motion and de-
ferent viewpoint image generation, and compensation of time formation that follow physical rules. A number of physical
delay in data transmission system. The purpose of image ex- models/equations [24] have been used to simulate and animate
trapolation is to synthesize plausible, rather than accurate, new quantities, i.e., energy, potential, and image, due to their ad-
images [1], [7], [12], [13], [32], [80][82]. Recently, the field of vantageous property of time-varying local feature changes.
view of a photograph is extrapolated from given image regions However, model parameters of physical equations are empiri-
under constraints [80]. In order to synthesize a longer video of cally provided or learned by videos but they remain constant
an original short video, repeating or periodic sequences are during synthesis [24], [87] as do non-physics based extrapola-
extrapolated from the video [1], [2]. For videos with moving tion methods [1][6]. Moreover, when solving discretized
equations in computational grids of an image, image degrada-
Manuscript received November 12, 2015; revised March 22, April 27, 2016; tion by quantization error is assumed to be inevitable. In terms
accepted June 6, 2016. Date of publication X; date of curent version X. This of image-based energy, i.e., physics-based image feature, such
paper was recommended by Associate Editor D. Tzovaras.
H. Sakaino is with Network Technology Laboratories, Nippon Telegraph image degradation can be viewed as energy loss. Therefore,
and Telephone Corporation, Tokyo, Japan, 180-8585. (e-mail: minimizing energy loss and accumulation error during image
s.hidetomo@lab.ntt.co.jp). extrapolation may maintain image quality over time. In this
This paper has supplementary downloadable material available at
http://ieeexplore.ieee.org, provided by the author. It includes videos. Contact
paper, we call this energy preservation. Since image energy can
s.hidetomo@lab.ntt.co.jp for further questions about this work. be assumed to consist of different sub-energy components, we
Color versions of one or more of the figures in this papers are available can utilize physics-/energy-based image features related to
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIP.2016.
texture, shape, and motion. In videos, texture may be present
1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2579307, IEEE
SAKAINO: VIDEO EXTRAPOLATION METHOD BASED ON TIME-VARYING ENERGY OPTIMIZATION AND CIP
and range from flat, weak, to strong features. Shape may previous image extrapolation methods generate significant
change from weak to strong edge and contour features. Motion image blur due to their low order approximation [27][29] and
may vary from zero, slow, to fast features and from linear, i.e., Finite Difference Method [87]. CIP requires no data smoothing
line, to non-linear, i.e., curve and rotation, features. For flu- and no original long-term images. Thus, our video extrapola-
id-like images and dynamic texture in videos, time-varying but tion method is frequency-band-independent, i.e., texture-free,
motion-, texture- , and shape-free image extrapolation equa- and shape-free as regards images/objects. Along with our
tions are needed unlike the previous oscillating surface oriented proposed energy balancer, this offers large local deformation
model [24], i.e., randomized motion model. and evolution over time, where rigid objects with motion, i.e.,
To this end, this paper presents a video extrapolation method zooming-in a rigid scene and rotating rigid objects are con-
based on energy optimization that requires a few images for tained. Experiments on many videos spanning a variety of
learning. Initial motion is estimated by physics-based optical scenes/objects from images with rigid objects, fluid-like im-
flow [78]. Using estimated optical flow, for updating velocity ages, to dynamic texture show that our proposed method out-
with linear and non-linear motion, i.e., rotation, velocity is performs state-of-the-art video extrapolation methods in terms
extrapolated by physics-based equations, i.e., Navier-Stokes of consistent spatio-temporal image quality with regard to
(NS) and continuity equations, which are introduced from fluid motion and texture.
dynamics. The property of velocity with respect to magnitude The contributions of this paper are four-fold: 1) For video
and orientation can be changed by the model parameters, i.e., extrapolation, a novel physics-based energy optimization
viscosity and density, of the NS equations. New images are scheme is proposed that uses image-feature based energies. 2)
spatio-temporally and physically extrapolated by the advection Model parameters are updated during extrapolation, whereas
equation whose parameters include image intensity and veloc- existing image/video synthesis methods are based on
ity. Owing to the advection equation, local image intensities non-physics based methods whose model parameters remain
can be changed according to local velocity vectors. Since the constant over time. 3) The proposed extrapolation method can
equations of NS and continuity are texture- and shape-free, this generate new videos with less image degradation from a
paper fully utilizes such characteristics of physical equations smaller number of images than previous methods. Previous
for image extrapolation. In order to ensure image quality with learning based methods require a large number of original
respect to image intensity and motion in the image extrapola- images along with frequency band truncation. 4) Advection
tion framework, we propose an energy preservation model, i.e., equation with optical flow uses high order Constrained Inter-
a Kolmogorov turbulence energy model [33], [34], as a con- polation Profile to realize energy lossless image generation due
straint for the optimization of time-varying physical model to the extrapolation of image intensity and gradient at the same
parameters. This differs from existing image synthesis methods time; state-of-the-art video extrapolation methods handle only
which employ non-physical optimization [32]. In [87], initial image intensity.
optical flow is time-varyingly extrapolated but the other phys- This paper is organized as follows: Section II describes re-
ical model parameters remain constant. The turbulence energy lated work. Section III discusses our proposed extrapolation
theory [33], [34] is known to well describe natural phenomena approach with time-varying energy balancer. Section VI con-
that show a stochastic characteristic between energy and the ducts evaluation experiments. Finally, Section V concludes our
frequency of moving objects. Its basic behavior contains de- paper.
velopment, decay, and the state of inertia. By balancing them, it
can be assumed that the dynamic texture of moving objects can II. RELATED WORK
be maintained over time. Therefore, this paper introduces an A number of video synthesis methods such as extrapolation
appropriate energy balancer to maintain image quality during and prediction have been reported. In [1] and [2], a large
extrapolation. In other words, image degradation refers to en- number of images are used to learn model parameters of par-
ergy loss, and energy loss can be recovered by increasing the ametric functions, i.e., low order spatio-temporal autoregres-
external energy. Since the spatio-temporal property of a phys- sion models. For data compression from a large number of
ical equation can be changed by its model parameters, this training images, the orthogonal decomposition of the Principle
advantage can be utilized by image quality modifications via Component Analysis based extrapolation method [3] has been
energy changes. Changing the model parameters allows the used. However, significant image degradation occurs due to
same extrapolation equations to cover a wider range of defor- least-squares data fitting or/and the low spatial frequency of
mation, i.e., motion, texture, and shape, than is possible with images: average images are synthesized by excluding local
the constant model parameter approach. This paper also pro- large motion in original videos. In [4], higher order Singular
poses to integrate different image-feature-based energies, i.e., Value Decomposition enhances image quality more than
edge, curl, and divergence, for characterizing the extrapolated methods of [1] and [2]. However, existing state-of-the-art im-
images. Since discretization of physics-based equations, i.e., age synthesis methods [1][4] require many past images to
learn model parameters with high reliability. In such orthogonal
partial difference equations, can accumulate quantization error
decomposition [3], [4], the large matrices incur high computa-
over time, this paper is the first to approximate images/videos
tion cost. In [5], a linear state space model is introduced for
by an extended Constrained Interpolation Profile (CIP) method
extrapolating new videos from original videos. Due to
from third [30] to fifth order approximation, which ad- non-causality in optimization control theory, only periodic
vects/moves images according to their intensity and gradient;
3
motion changes [1][4] present in the learning videos can be motion changes, where no physical model is used. In order to
generated. Moreover, unlike our extrapolation method, motion fill in holes in an image, millions of photographs are collected
features cannot be directly obtained. by Internet searches [85]. However, since the methods in [80],
In order to avoid perceptible image degradation, different [82], [85] use the shift-map image synthesis framework, no
approaches have been proposed [18]: the ordering of the orig- physical property is modeled and no video application has been
inal video frames is changed to extend play duration. Anima- shown. The global optimization problem, i.e., global spa-
tions have been created from multiple images [25], [26]. Since tio-temporal consistency between all patches in and around a
only repeated scene changes are generated, no prediction with hole is used by [81] to achieve space-time completion of video
physically plausible changes is possible. Using start and end to eliminate undesired static or dynamic objects, modification
images, input images can be warped/morphed by either manual of a visual story by replacing unwanted elements, correction of
correspondence or computer vision techniques [19], [20], [22]. missing/corrupted video frames in old movies, creation of new
Most recently, the correspondence between two different textures by extending smaller ones, and creation of complete
frames has been improved by using moving gradients to handle field-of-view stabilized video. However, this model is effective
occlusion [7]. Different viewpoints [13] are also presented in only when video sequence S has global visual coherence with
[7], [20]. Using two consecutive frames, forward and backward some other sequence D, i.e. every local space-time patch in S
optical flows are computed and used to extrapolate video wa- can be found somewhere within sequence D. Thus this method
tercolorization [9]. As mentioned above, previous image in- is based on the assumption of the existence of videos that
terpolation methods are based on non-physical models. For demonstrate high correlation between past, present, and future
inpainting, a number of extrapolation and interpolation meth- of scenes/objects. Therefore, this method can not generate new
ods have been proposed [8], [10], [11], [16], [35], [38]. In [10], image sequences with physically evolving objects. In [32], an
a fluid based model is proposed but it is limited to the filling in optimization-based texture synthesis method has been reported
of narrow holes surrounded by texture-less or flat image for a static image, where the similarity error between local
propagation. different patches is minimized in terms of image intensity and
For synthesizing motion and texture over time, the model in its gradient.
[21] assumes that dynamic textures such as waterfalls and A weather radar pattern prediction model was introduced in
snowflakes can be approximated by hundreds of thousands of [87], but its model parameters are, except optical flow, constant
small deformable particles with stochastic trajectories. How- during prediction even for different image sources such as
ever, this approach cannot be applied to videos demanding the precipitation and satellite radar images. Moreover, image
use of object-specific particle models with hand-crafted design. quality is, in terms of prediction accuracy, degraded, particu-
For oil painting [24], a wave equation has been used to generate larly with long prediction periods as there is no image quality
randomly oscillating animations from a single image, e.g., lake preservation model and discretization by Finite Difference
with wind, where model parameters have to be empirically Method (FDM). FDM simply relies on updating image inten-
tuned. This is specific to wave surface changes. Unlike our sity over time; this paper proposes to enhance image quality by
proposed extrapolation equation, a wave equation [24] does not extending Constrained Interpolation Profile (CIP) [30] to
contain physical terms, i.e., advection and diffusion terms, and higher order. CIP can extrapolate both image intensity and its
so is unable to move objects with translation, i.e., line, curve, gradient at the same time. It is noted that CIP has not been used
and rotation, i.e., vortex, and diffusion. Moreover, since a wave for video extrapolation so far.
equation does not provide a velocity term, estimated velocity, Thus, state-of-the-art video extrapolation methods use no
i.e., optical flow, can not be used whereas our extrapolation physics based optimization models and no time-varying model
method can cope with various motion changes, i.e., weak to parameters.
strong waves, from videos for image extrapolation. In videos
with static and dynamic objects/scenes, a wave-equation-based III. THE PROPOSED EXTRAPOLATION METHOD
method [24] requires dynamic objects/scenes [88] to be seg-
mented from static ones. On the other hand, our proposed This section shows our proposed extrapolation method for
method moves dynamic image regions with velocity and static videos. Motion and image intensity are estimated and extrap-
image regions with zero velocity, so no segmentation is needed. olated under the proposed energy balancer.
In addition to the above video-based-extrapolation methods, A. Optical flow and advection equations
an image can be used to extrapolate a wider field of view [84], In order to extrapolate new videos from original videos, we
[86] than the original one by learning from a roughly aligned denote the mathematical and physical relationships between
and wide-angle guide image of the same scene [80]. Also, the velocity, image intensity, and time. Basically, a video-based
image transformations of rotation, scaling, and reflection are extrapolation model should rely on present and past images.
incorporated under hierarchical graph optimization. Interactive Therefore, we first derive the relationship between such images.
digital photo-montage is introduced in [83]. In [82], image Let I I ( x, y , t ) be image intensity at two-dimensional coor-
completion for filling in missing image regions and deleting
unnecessary objects is proposed; the method uses the statistics dinates ( x, y ) at time coordinate t in an image. In computer
of similar patches, i.e., patch offsets, where both match- vision, motion estimation, i.e., the optical flow model, is used,
ing-based and graph-based methods are used. However, this where one of the simplest assumptions between two consecu-
method is less suitable for inpainting and outpainting with tive images at t 1 and t can be defined as the image brightness
constancy constraint [36], i.e. dI =0 . Applying the Taylor=

ex- (uin, j , vin, j ) arg min || I in, +j 1 I in, j + uin, j ( I in+1, j I in, j ) +
dt
u ,v ( i , j ) (3)
pansion series to dI , we obtain:
dt vin, j ( I in, j +1 I in, j ) ||2 + Contraints ,
dI ( x , y ,t )
t I + ( t x )( x I ) + ( t y )( y I ) n +1 n n n n n
dt (1) Ii, j =
I i , j ui , j ( I x ) i , j vi , j ( I y ) i , j + Other effects , (4)
=I t + uI x + vI y =0,
n n +1 n n
where t I I i , j I i , j is temporal gradient and x I
where I x , I y , and I t are partial derivatives x , y , and t
n n n n n
with respect to x, y, and t of image intensity I ( x, y , t ) , respec- I i +1, j I i , j and y I I i , j +1 I i , j are spatial gradients. It is
n n +1 n
tively. First derivatives of ( x, y ) with respect to time t obtain suggested that t I I i , j I i , j in (3) only requires two con-
optical flow (velocity) u = (u , v ) . It is noted that I t is deter- secutive images at time n and n + 1. It is noted that this is a
forward temporal difference method for future scenes. On the
mined from two consecutive image frames with respect to time. other hand, there is another extrapolation option, the use of
u is estimated from (1) in a least-squares manner. To enhance n n 1 n
the accuracy of estimating u , several constraints have been t I I i , j I i , j , which represents the backward temporal
applied [17], [45], [58][68], [75][79]. In fluid dynamics, it is difference for past scenes: I n 1 is a known variable while I n
known that substance, i.e., fluid, is moved by velocity. This is is an unknown variable. In Section III-D, (4) is solved and
called the advection phenomenon. For simplicity, this phe- updated by our proposed method.
nomenon can be described by a one-dimensional coordinate
wave, i.e., sin( x ut ) , where coordinate x , velocity u , and B. Initial optical flow estimation
time t are defined. By taking derivatives with respect to x and For videos, our extrapolation method begins with optical
t , we obtain t sin( x ut ) =
u cos( x ut ) and flow estimation. There are a number of optical flow methods
[17], [45], [58][68], [75][79]. Most are modeled using the
x sin( x ut ) =cos( x ut ) , respectively. Next, taking these framework of (3). The size of image region is either a
two equations and replacing sin( x ut ) with function f ( x, t ) , subimage of the whole image or the whole image. For local
yields (2): texture change detection, since we deal with physics-based
motion changes, this paper employs a physics-based optical
t sin( x ut ) + u u sin( x ut ) =0 flow method, i.e., continuity equation and divergence-curl
t f = u x f , (2) velocity constraint [17], brightness variation constraint [45],
where (2) is called the advection equation. Using known ve- and wave physics-based constraint [78], using past and present
locity u and present f , unknown next-time f is extrapolated. images. In particular, since we deal with different image prop-
Function f in (2) can be extended to two dimensional coordi- erties from rigid, elastic, to fluid and from continuous to dis-
continuous motion, the method in [78] is suitable for our pur-
nates f ( x, y , t ) : t f + u u f + v v f =0 . From the above, pose. As is described below, the estimated optical flow is input
equations (1) and (2) are differently derived but they are found to the extrapolation equations and proposed energy optimiza-
to be identical. Since f is an arbitrary variable, f can be replaced tion as the initial optical flow, i.e., velocity.
by image intensity I ( x, y , t ) . To summarize, the optical flow
C. Physics-based velocity extrapolation model
equation allows unknown velocity u to be estimated from
known image intensity I , whereas from the advection equa- In order to extrapolate new videos from original videos, our
tion, unknown image I can be extrapolated from known ve- proposed model is based on physical models in terms of ve-
locity, i.e., optical flow, and image intensity. This section de-
locity u and present f . Thus, this paper introduces the new
scribes the former while the latter is shown in Section III-D.
viewpoint of (1), where two different roles of equations can be
Suppose that initial velocity has been determined by (3) and
derived by simply interchanging known (I/u) with unknown
objects with image intensity in the video move and change
variable (u/I) for optical flow and extrapolation. Subsequently,
according to (4). By extrapolating a time-varying new velocity,
in order to obtain pixel-based velocity and image intensity
objects/scenes can be extrapolated with much higher degrees of
changes, (1) and (2) are discretized in images by spatial interval
deformation, i.e., fluid-like images and dynamic texture, than
x = y = h and time t = t n : n = 1,..., N, and pixel coor-
by extrapolating images with constant velocity over time.
dinates ( x, y ) = (ih, jh) : integers (i, j ) R 2 are defined. For Scenes of nature demonstrate many physical phenomena such
simplicity, we denote t = h = 1, I (i , j , n ) I in, j , and rippling water and curtain-like aurora. Therefore, an extrapo-
lation equation must cope with local image intensity and ve-
u in, j = (uin, j , vin, j ) . For image region , velocity is estimated by locity changes as well as physical changes from pixels to re-
(3): if necessary, constraints [36][46], i.e., smoothness of gions. To satisfy with these requirements, we employ the Na-
velocity, are added to minimization/optimization. Using ve- vier-Stokes (NS) equation as it can physically extrapolate from
locity (3) and a present image, a new image is extrapolated by an initial velocity. In Section III-D, we introduce an image
(4): if necessary, other (physical) effects, i.e., diffusion, are intensity extrapolation model. The NS equation with velocity
added. and pressure is widely used in computational fluid dynamics,
where free-surface motion, divergence-convergence motion,
5
and vortexes are generated over time. Given current velocity u , n

i-th position and (integer) n-th time of f is denoted by fi . Most
the NS equation can extrapolate the next time velocity as fol-
lows: existing discretization methods use only discrete grids, i.e.,
[i, i + 1] , with simple linear interpolation. The Finite Difference
u = ( u ) u 1 p + u ,
2
t
(5)
Method (FDM) has been widely used in computer vision, im-
age processing, and numerical simulation. However, as shown
where x + y and xx + yy . In (5), the left term, the
2
in Fig. 1 between time t + 1 and t + 2, when moving/advecting a
first, second, and third terms of the right represent temporal convex-concave profile by velocity, FDM can cause error over
gradient, advection, pressure gradient, and diffusion, respec- long time steps, and so degrade the original value of f into an
tively. Viscosity and density specify nonlinear attributes oversmoothed profile (leftside of Fig. 1). This error, called
ranging from linear laminar to turbulent flow. Raising the dissipative error, involves error accumulation and insufficient
viscosity decreases the number of vortexes. Pressure p can be approximations of the gradient term, i.e., advection term. In
calculated while solving the NS equation where the initial value extrapolating images/videos, the appearance of original texture,
is zero. However, since pressure plays an auxiliary role in edge, and contour can be assumed to be degraded, i.e., blur. To
solving the NS equation, this paper does not directly use pres- counter this issue, this paper proposes to introduce the Con-
sure and only velocity is applied to (4) from (5). On the other strained Interpolation Profile (CIP) method [30]. CIP consists
hand, since (5) is a non-linear equation with instable properties, of non-linear interpolation between grid intervals; its advan-
its stabilization is needed to obtain stable solutions, i.e., veloc- tageous characteristics include concurrently updat-
ity. For this, moving objects are assumed to be incompressible, ing/advecting both a value and its derivative/gradient with
where the continuity equation with respect to velocity is written non-linear interpolation.
as:
u =0 . (6)
nt
gradie
This also reduces computation cost. In order to solve (5) and
blur
image
(6), the SOLA [89] algorithm is used in a non-linear manner t+1 t+2 t+1 t+2
FDM CIP
(See supplementary material for details of the algorithm and
full discretization of equations). In this paper, we propose to (a) (b)
optimize three parameters, , , and , in NS and diffusion Fig. 1. Comparison of image extrapolation by (a) Finite Difference
Method (FDM) and (b) Constrained Interpolation Profile (CIP) given a
equations (see (18)); they were empirically determined in pre- velocity from time t + 1 to t + 2. CIP uses both image intensities and
vious studies [58], [60], [62], [71], [87]. Sections III-E, -F, and gradients while FDM uses only image intensities. CIP can maintain the
-G show this optimization framework. original profile over extrapolation.
D. High order extrapolation model by CIP
It is noted that FDM only updates a value, i.e., image inten-
As mentioned in Section III-A, we introduced our extrapo- sity, over time. For CIP, an efficient two-step algorithm for
lation method by using (2). However, when discretizing (2), the solving (8) has been proposed. In the first step, velocity in (8) is
extrapolation equation (4) can degrade image quality over time assumed to be constant. Subsequently, in second step, velocity
due to discretization/quantization error. Therefore, this section in (8) is varied in both space and time.
proposes an extrapolation method that offers enhanced image (CIP: Constant velocity case)
quality. Without loss of generality, we introduce the image We introduce the first step algorithm: the constant velocity
extrapolation model for one-dimensional coordinates as fol- form of (8). For nonlinear interpolation between grid intervals,
lows: We assume an object with value f ( x, t ) at coordinate x there are a number of nonlinear interpolation functions with
at time t . It is also assumed that when an object moves with different orders. In order to capture and represent finer image
velocity u in a very small time step/interval t , values be- features, higher-order nonlinear interpolation functions may be
tween at time 0 and t are constant, where this relationship can appropriate for local texture and motion changes. In this paper,
be given by we propose to employ the fifth-order interpolation function
f ( x=
, t ) f ( x ut , 0) . (7) between three points, i.e., grid intervals [i 1, i + 1] , instead of
Since this is modeled by the advection equation with sub- the two points, i.e., [i, i + 1] , used in the third-order interpola-
stance, i.e., image, f and velocity u, we can derive an equation
by rewriting (2) as follows: tion function [30]. Suppose that, for simplicity, xi 1 = x ,
t f + u x f =0 , (8) xi = 0 (origin), and xi +1 = x are defined. On these grid points,
known values and their gradient values are given, i.e., a total of
where t f and u x f are called temporal and advection term, six known values. Using values and gradients among i 1 , i ,
respectively. In order to solve (8), we first need to pay attention and i + 1 , we define the fifth order approximation function
to the velocity term, which is constant or variable with respect F ( X ) with six coefficients a , b, c, d , e , and X at X and its
to time and position. The former formulation is much simpler
than the latter. Our computational grids equally divide space x gradients X F ( X ) with respect to X are used.
and time t using intervals x and t , respectively. (integer) 5 4 3 2
F ( X ) = aX + bX + cX + dX + eX + X , (9)
4 3 2
X F ( X ) = 5aX + 4bX + 3cX + 2 dX + e , (10) In (20), for convenience, we use H as t f + u x f =
where X= x xi . Values and their derivatives on the grids are h f xu =H . With x f = g , the gradient of advection is
assumed to be continuous. From six known values and their given by (21):
gradients, six boundary conditions are uniquely derived by: t g + u x g = x h x f x u . (21)
n n n
F ( x ) = fi 1 , F (0) = fi , F ( x ) =fi +1 , (11) It is known that the above equations can be efficiently solved
n n n n in two steps: advection and non-advection phase [89]. This is
x F ( x ) = x fi 1 = g i 1 , x F (0) =
x fi gi ,
= called the semi-Lagrangian method.
x F ( x )
n
= x fi +1
n
=g i +1 , (Advection phase)
(12)
From (20) and (21), we extract equations that contain only
where for simplicity, x f =
g is used. From (9) and (11), advection terms with respect to f and g:
F (0)
= f=
n
X . From (10) and (12), x F (0) =
n
x fi ==
n
gi e . t f + u x f = 0 . (22)
i
Likewise, the remaining four unknown coefficients of a , b, c, t g + u x g = 0 . (23)
and d can be readily obtained at x by using (9)(12). We (Non-advection phase)
summarize the six coefficients of (9) and (10) by: From (20) and (21), we extract equations that contain only
3( f n f n ) g n + 4 gin + gin1 , b=
n n n
f 2 f i + f i 1 gi +1 gi 1 ,
i +1 +
n n
non-advection terms with respect to f and g:
a=
i +1 5 i 1 + i +1 4 3
4x 4x 4 2x 4x
t f = H . (24)
5( f i +n1 f i n1 ) gin+1 + 8 gin + gin1 , f i +n1 2 f i n1 + f i n1 gin+1 gin1
=c = d ,
4x 3 4x 2 x 2 4x t g = x h x f xu . (25)
e = g , X = f i .
n n
(13) In the above, extrapolation with constant velocity can be it-
i
eratively obtained by ( f n , g n ) ( f n +1 , g n +1 ) using (13),
From the above equations, we can obtain value f at time n.
Using (7), value f at the next time n + 1 can be extrapolated by (15), and (17). In the next step, extrapolation with time-varying
velocity is taken into consideration. Therefore, the intermediate
t ) f ( xi ui t , t ) .
f ( xi , t + = (14) time * between n and n + 1 is introduced and the two-step
Finally, from (7), (9), and (11), extrapolated value fi n +1 is up- algorithm is used to solve equations (22) to (25):
dated with (9) according to: ( f n , g n ) ( f * , g * ) , ( f * , g * ) ( f n +1 , g n +1 ) . We discretize
fi n +=
1
) a 5 + b 4 + c 3 + d 2 + e + X ,
F ( xi u t= (15) (22) and (23) into (26) and (27), respectively:
where =u t . In addition, gradient gi of fi , i.e.,
n +1 n +1 f i n +1 = f i * + H i t . (26)
H i +1 H i 1 u u
x f =
g , is derived by taking the derivative of (8): gin +1= gi* + t gi* i +1 i 1 t . (27)
2x 2x
t g + u x g =0 . (16) Using H i = ( fi n +1 fi * ) / t , (27) can be modified to:
Following (10) and (13), the extrapolated gradient gin +1 is up- ( f n +1 f n*+1 ) ( fi n1+1 f n*1 ) u u
gin +1= gi* + i +1 t gi* i +1 i 1 t .
2xt 2x
dated by: (28)
gin +1 = x F ( xi u t ) =5a 4 + 4b 3 + 3c 2 + 2d + e . (17) It is noted that the above H contains a diffusion ef-
(Varying velocity case) fect/equation. Thus, by using (26) and (28), a new value and its
In the above, we discussed extrapolation equations (15) and gradient at time n + 1 can be iteratively updated from its present
(17) for constant velocity. Next, this section examines the var- value and its gradient value at time n: ( f n , g n )
ying velocity case of advection equation (8), to which function ( f * , g * ) ( f n +1 , g n +1 ) from initial time n = 1 to the user de-
h is added:
fined N. Note that we wrote value f and its gradient g for the
t f + x (uf ) = h , (18) one-dimensional case but it is easy to extend them into two
where h can be any effect, i.e., a diffusion effect with a diffu- dimensions. Finally, value f and its gradient g are replaced by
sion coefficient , i.e., h = xx f . The derivative of the image intensity and its gradient, respectively. For color videos,
color components, i.e., red, green, and blue, are independently
second term of the left side of (18) becomes extrapolated.
t f + u x f + f x u =h .
(19)
E. Proposed energy optimization
In order to efficiently solve and extrapolate data, (19) is sepa-
In image extrapolation, since texture and motion of images
rated into two-term sets: advection and non-advection phases.
should change simultaneously over time, a constraint for image
From (19), advection and non-advection phases can be shown
quality consistency is needed. This can be realized if the model
in left and right sides of (20), respectively.
parameters of the extrapolation equations are optimized.
t f + u x f = h f x u . (20) However, it is generally difficult to uniquely determine three
parameters, i.e., , , , of the Navier-Stokes (NS) and diffu-
sion equations. Existing methods empirically determine the
model parameters of NS equations and other equations [8], [10],
7
[11], [14], [24], [27], [28], [29], [30], [39], [50], [51], [58], [59], 2
[63], [71], [87]. In order to efficiently and effectively balance ( Eng ) n = 1 u in, j , (31)
2 i, j 2
them, this paper proposes an image-feature based energy op- where | u |= (ui , j )2 + ( vi , j )2 . Second, non-linear motion
timization framework. For this, we introduce a physical model, i, j
i.e., the Kolmogorov turbulence energy spectrum from turbu- patterns can be present in videos, where rotation and diver-
lence theory [33], [34]. The profile is shown in Fig. 2, which gence [17] may be locally mixed. Therefore, local vortex re-
plots wave number vs. energy. This energy spectrum explains lated energy is defined as the combination of two energies:
how fluid is activated in the contexts of wave number (fre- 1 2 2
quency) and energy. Fluid with higher energy (low frequency) ( Eng ) n = ( divuin, j + curluin, j ) , (32)
3 i, j 2
can transport its energy to fluid with lower energy (high fre-
quency). In higher frequency bandwidths, energy is lost by where divu i=
, j 0.5{( ui , j +1 ui , j 1 ) + ( vi +1, j vi 1, j )} and
viscosity force. On the other hand, in lower frequency band-
curlu i=
, j 0.5{( vi , j +1 vi , j 1 ) + ( ui +1, j ui 1, j )} . The number and
widths, vortex energy is maintained due to certain energy sup-
ply. magnitude of the vortex and growing/decaying regions impact
E(k) Energy loss
this energy. In addition to the above motion-related energies,
texture complexity can be quantized by image gradients from
Kolmogorov Energy
Initial energy
blur to sharp. Therefore, edge energy is defined as image gra-
dient energy as follows:
n 1 n 2
In prediction phase ( Eng 4 ) i , j = | I i , j | , (33)
k 2
Wave number
Fig. 2. Kolmogorov turbulence energy of fluid phenomena: Wave where | I |= {( I x )i , j }2 +{( I y )i , j }2 and x + y . Inte-
i, j
number vs. Kolmogorov Energy. Lower energy from initial energy
indicates energy loss that corresponds to image degradation. grating all energies (30)(33) yields the total energy at each
time step. Four energy functions are weighted by s, i.e.,
This statistically results in single mode profile. Therefore, to 0.25 in two-dimensional region (M-by-N pixels):
n
approximate this profile, this paper uses the probability density 1
(i, j) (Eng c )i, j ,
4
E (n) = =c 1 c (34)
function of a Gamma distribution ( x>0 ): MN
G ( x; sh, sc ) = x
sh 1 exp( x /
(29) sc ) k 1 ( sh ) , where c4=1 c = 1 . We do not constrain the extrapolation size,
where sh (integer) >0 and sc >0 are shape and scale parameters, shape, texture, or motion of dynamic textures; owing to this,
respectively. Physical evidence from fluid dynamics [34] in- their visual and physical coherency are maintained over time
dicates that turbulence can persist over long life cycles if these during extrapolation.
energies are balanced over time. Thus, this paper proposes F. Energy optimization during extrapolation
optimization based on total energy conservation between the
initial image and subsequent extrapolated images. We call this In order to maintain the original video image quality, this
the energy balancer. Further, it is assumed that the corruption of paper proposes an energy minimization framework. For this,
the initial image quality and physical properties will be mini- we first assume that initial total energy, (34), is maintained over
mized by applying this law over time. Since it relates to wave time. Thus absolute difference between initial total energy
property, we call it the wave energy. In the following, we de- E(n) at time n and extrapolated energy E(n + p) at time n + p
scribe how to estimate/use wave energy. Initial velocity at n is should be minimized. At every step, n + p ( 1 p N ),
estimated by optical flow. This velocity is, at each pixel, | 1=| | E ( n + 1) E ( n ) | , | 2 =| | E ( n + 2) E ( n ) | ,..., and
transformed into frequency via Fast Fourier Transformation
(FFT), where wave number k equals 2 frequency . We ap- | N |= | E ( n + N ) E ( n ) | are computed by changing three
proximate the obtained wave number distribution as a Gamma model parameters, , , :
function in a nonlinear least-squares manner. This is because
, , ) p arg min | E ( n + p ) E ( n ) | .
(= (35)
the obtained wave numbers will be too sparse so such a function , ,
can approximate the original data as continuous data. We de- where the minimization of (35) is continued until a minimum
fine wave energy as: error is found within the range of three parameters:
1 n 2 0.1 5.0 , 0.1 5.0 , and 0.1 5.0 . From (35),
( Eng ) n = k . (30)
1 i, j 2 i, j the three modified model parameters of ( , , ) are used for
Since wave energy presents only one property of a moving the Navier-Stokes (5) and advection ((26), (28)) equations,
object, we include other energies. For characterizing local where the extrapolated image intensity with extrapolated ve-
image features, velocity and texture based energies are used locity minimizes the image degradation of fluid-like images
First, local kinematic (motion) energy is defined as a function and dynamic texture. When extrapolating moving rigid objects
of velocity. The faster an object moves, the higher its energy is. with shapes, energy modification may be less so the apparent
This is denoted by: texture may also remain unchanged over time.
G. Outline of our proposed algorithm (a) is one of ground truth images (#1#230). Red arrows rep-
We outline our energy-optimization based video extrapola- resent a rough motion sketch.
tion method in Fig. 3. It consists of two major steps:
(step-1) Motion estimation: optical flow as velocity is extracted
from images at n 1 (past) and n (present). Initial total energy
from image-feature based energies is stored using (34).
(step-2) Video extrapolation based on energy balancer: new
images after time n + 1 are extrapolated using (26) and (28).
Advection equations with fifth-order Constrained Interpolation
Profile, Navier-Stokes (NS) and continuity equations are used Ground truth Our method
to update image intensity, gradient, and velocity. For main- (a) (b) (c)
taining image quality over time, energy optimization is carried
out by minimizing the difference between initial and new total
energy using (35). From this, a new/modified velocity is ob-
tained by NS equation with updated model parameters. Finally,
the video is extrapolated with the advection equation and new
velocity.
(d) (e)
Fig. 5. Comparison of video extrapolation using a rippling wave video:
(a) Ground truth. (b) Our proposed method. (c) Optical flow used by
our method as initial velocity field. (d) PDPP [3]. (e) HOSVD [4].
(a) (b) (c)

Fig. 3. Our two-step video extrapolation method from two images
using optical flow, advection equation, and NS equation, where opti-
mization is conducted using a time-varying energy balancer assuming
the consistency of image-feature based energies over time.
IV. EXPERIMENTS AND DISCUSSION

In order to validate our video extrapolation method, we (d) (e)
conducted many experiments using various videos containing Fig. 6. Comparison of video extrapolation for rippling wave video:
fluid-like images, dynamic textures, and moving camera scenes. (a) Ground truth. (b) Proposed method. (c) Optical flow used by pro-
In particular, since fluid-like moving objects and dynamics posed method as initial velocity field. (d) PDPP [3]. (e) HOSVD [4].
textures have no definite structure, most mesh-based models
have difficulty in extrapolating new frames with less degrada- In Fig. 5(a), water flows with semi-transparent skyblue re-
tion. Therefore, we selected, as our baseline methods, two flections from left to right with small and large waves over time.
no-mesh state-of-the-art video extrapolation methods: PDPP On the other hand, in Fig. 6(a), aurora, a curtain-like colorful
[3] and HOSVD [4] (See also the other results in supplemen- semi-transparent natural phenomenon, changes local color and
tary videos.). In all extrapolation experiments, our method texture over time. Figs. 5 and 6 show the extrapolation results
required only two consecutive images while the benchmarks of our method (b) with optical flow (c), PDPP (d), and HOSVD
needed more than 5 frames depending on the scenes and mov- (e). In (c), optical flow estimated from consecutive images is
ing objects. Quantitative evaluations are discussed in Sections used by our method as initial velocity but modified based on
IV-D, -E, and -F. A number of challenging videos with short- our proposed energy conservation model. In Figs. 5 and 6, the
and long-term were used to compare our extrapolation method optimized physical parameters used in the NS equation were
with state-of-the-art methods. (See supplemental material for ( , , )Fig.5 = (3.5, 1.4, 1.2) and ( , , )Fig. 6 = (1., 0.3,
magnified images of results.) 4.2), respectively. In previous methods (d) and (e), as compared
A. Video extrapolation of short-term moving objects with ground truth image (a), local wave image features were
more degraded than those of our method (b), where less water
Short-term video extrapolation experiments, i.e., one or a
surface ripple and fewer illumination reflections are evident.
few frames ahead, were conducted. Rippling wave and aurora
Also, repeating or random motion were generated in videos.
videos were used as shown in Figs. 5(a) and 6(a), respectively:
This degradation is due to the loss of high frequency compo-
nents, which have been low-pass-filtered and/or compressed by
9
the limited number, i.e., hundreds or more, of orthogonal bases scenes with rigid objects were used. Fig. 8 shows a camera
used in state-of-the-art methods (d), (e). It is also noted that the zoom-in to the center of a can against a complicated back-
images extrapolated by the benchmarks, even just one-step ground, i.e., cloth, string, bars, and plates, where image features
ahead, are very unnatural. with shape, edge, and contour are contained.
To better understand the performance of our method, we
carried out a longer extrapolation experiment, i.e., 60 frames
ahead; Figure 7 shows a shrinking sequence of an ultrasound
echo heart video, where our proposed video extrapolation
method is compared with two state-of-the-art video extrapola-
tion methods (PDPP and HOSVD).
(a) (b)
(a) (b) (c)
(c) (d)
(d) (e)
Fig. 8. Comparison of video extrapolation using zoom-in video with
rigid objects: (a) Ground truth. (b) Proposed method. (c) Synthesis
error. (d) PDPP [3]. (e) HOSVD [4].
Figure 9 shows a Rubik cube rotating on a stage against a

static background. In Figs. 8 and 9, (a) the first row, (b) the
second row, and (c) third row illustrate initial video image,
(e) (f) optical flow of our method and extrapolated images by our
Fig. 7. Comparison of video extrapolation using shrinking-phase heart method, and extrapolated image errors between ground truth
video: (a) Initial ground truth image. (b) Initial optical flow for pro-
and extrapolated images by our method, respectively. In (c),
posed extrapolation method. (c) 60th-frame ground-truth image from
(a). (d), (e), and (f) 60th-frame image extrapolated by our method, when zooming in, image errors (red/blue) increase, indicating
PDPP [3], and HOSVD [4], respectively. the image difference between the ground truth and extrapolated
images. It is noted that since image errors were small, magni-
Fig. 7(a) shows the initial video image, which begins to fication by a factor of three was used for clarity. For easier
shrink. There is semi-transparent biological membrane envel- understanding, we also magnified the original extrapolation
oping the heart which also shrinks. (b) is the estimated initial errors in image intensity (red/blue). As shown in (b), the logo
optical flow for our method using (a) and the next frame. (c) on the can is legible, and the string and bars show no significant
shows the 60th-frame ground-truth image from (a). (d), (e), and deformation. On the other hand, state-of-the-art methods using
(f) show the 60th-frame images extrapolated by our method, PDPP [3] (d) and HOSVD [4] (e) yielded more degraded im-
PDPP [3], and HOSVD [4], respectively. In (d), initial optical ages than our method (c), where the Rubik cube and stage did
flow (roughly sketch by red arrows), (b), was predicted by our
not rotate smoothly, just repeatedly flicker leftward and
proposed optimization process. PDPP extrapolation yielded the
rightward. Next, as shown in Figs. 8 and 9, images extrapolated
most degraded image (e). The images extrapolated by our
by our method (b) have only small errors (c), where extrapo-
method (d) and HOSVD (f) were closest to (c). It is noted that
in (f) the appearance is somewhat clear but no shrinking motion lated images (b) show less degradation than those by previous
with texture can be recognized. Rather, flickering motion was methods (d) and (e). Moreover, very smooth motion of rigid
extrapolated over time. objects was confirmed. Thus, it has been demonstrated that our
proposed method can extrapolate moving rigid objects with
B. Long-term video extrapolation of moving objects original image features, i.e., shape, edge, and contour as well as
To validate long-term extrapolation, two different video motion in long video sequences with less image degradation
than state-of-the-art video extrapolation methods [3], [4]. vectors for visual assistance. Previous extrapolation methods
produced merely averaged videos and failed to generate realis-
tic past or future videos.
D. Quantitative evaluation of extrapolation methods
In order to verify the effectiveness of our method with regard
to fifth-order Constrained Interpolation Profile (CIP) and en-
ergy balancer, we conducted a quantitatively evaluation using
Figs. 510. Absolute average difference (error in image inten-
sity per pixel) between ground truth and extrapolated images is
determined. We also examined our proposed extrapolation
method with/without energy balancer. PDPP [3] and HOSVD
[4] are used. For our method, all frame pairs are used. For ex-
isting methods, every set of 5100 consecutive frames are used
to learn model parameters, where they remained constant dur-
ing extrapolation. Note that the number of learning images is
different in each of Figs. 510. In Figs. 5 and 6, initial frames
are shifted from frame #51 to #200 and extrapolation errors, i.e.,
(image) intensity/pixel, were averaged.
Fig. 11 summarizes the video extrapolation errors. Since our
video extrapolation method relies on optical flow, we added
(a) (b) (c) and compared three physics-based optical flow methods with
different constraints: continuity equation and divergence-curl
velocity constraint (fw1) [17], brightness variation constraint
(fw2) [45], and wave physics-based constraint (fw3) [78]. In
Fig. 11, PDPP [3] yielded the largest average errors while our
method with energy balancer achieved the lowest average er-
rors. Particularly, our methods found that the three optical flow
methods demonstrated different levels of effectiveness. Among
(d) (e) them, our proposed physics-based optical flow method (fw3)
Fig. 9. Comparison of video extrapolation using a Rubik cube se- [78] extrapolated videos with lowest average errors, following
quence on a rotating stage: (a) Ground truth. (b) Our proposed method. (fw2) [45] and (fw1) [17].
(c) Synthesis error. (d) PDPP [3]. (e) HOSVD [4].
C. Video extrapolation: past and future
In order to demonstrate the usefulness of the proposed
method, extrapolation up to 20 (past) and + 20 (future) frames
from time 0 (present) was carried out. As explained in Section
III-A, past and future videos can be easily synthesized by
changing the temporal term approximations: backward and
forward temporal differences. As shown in Fig. 10, we selected
a video of non-rigid moving and deforming objects, lava
(molten rocks) erupting from the ground.
Fig. 11. Comparison of 8 video extrapolation methods in terms of

average extrapolation error for the videos in Figs. 510. Two previous
methods (PDPP [3], HOSVD [4]) are used. In our proposed method,
effectiveness of three physics-based optical flow methods are com-
pared using fw1 [17], fw2 [45], and fw3 [78] with and without energy
balancer. In our method, fifth order CIP is used. F., eng, w/o, and
w denote Figure, energy balancer, without, and with, re-
spectively. For a guidance, a black arrow indicates HOSVD: .
Fig. 10. Past and future videos from the present images, lava, by our
proposed video extrapolation method: upper column (ground truth) It is noted that, even with (fw1) [17], our physics-based
and lower column (our proposed method). video extrapolation method, shown by the red dotted arrows,
outperformed two state-of-the-art methods: PDPP [3] and
Our proposed extrapolated images (lower column of Fig. 10) HOSVD [4]. Image degradation by PDPP and HOSVD was due
became close to the ground truth unimodal shapes (upper to low pass-filtering and/or approximation using a limited
column of Fig. 10, where red arrows indicate rough motion number of orthogonal bases. Unlike such high frequency
11
component truncations, our frequency-free extrapolation total order is estimated by summing the two phases of learning
method retains the original high frequency component. Thus, OL and extrapolation OE. MN, F, R, L, P, and S stand for image
the proposed energy-based extrapolation method with size (M N pixels), the number (#) of learning frames, iteration
fifth-order CIP, energy balancer, and physics-based optical number, orthogonal base number, extrapolation frames, and
flow method (fw3) [78] has been proven to be effective for parameter combinations, respectively. PDPP and HOSVD
fluid-like images and dynamic textures even with rigid moving incurred high computational cost for learning due to the use of
objects/scenes. Principal Component Analysis (PCA) and Singular Value
Decomposition (SVD), respectively. HOSVD applied SVD
E. Quantitative evaluation of advection equation over 3 times. That is, since PDPP and HOSVD use a large
In this section, we evaluate the approximation accuracy of number of learning frames F ~ O(102) to enhance extrapolation
the proposed fifth-order CIP (C5) for advection by comparing it image quality, the matrices in PCA and SVD can become
to three (C3: existing method [30]) approximation methods, i.e., enormous. Particularly in PDPP, such large matrices are sub-
Stams methods with first (S1), second (S2), and third (S3) jected to fast Fourier Transformation (fFT).
order approximations [27], Finite Difference Method (FDM)
TABLE I
with first (F1), second (F2), and third (F3) order upwind dif- Comparing state-of-the-art methods to the proposed video extrapola-
ferencing [34]. Therefore, we compare a total of 16 approxi- tion method in terms of computational complexity
mation methods for advection equation (2) with and without the
proposed energy balancer. For initial optical flow estimation,
optical flow method (fw3) was used for the 16 methods. It is
noted that S3 without energy balancer corresponds to our pre-
vious FDM based extrapolation method [87].
Fig. 12 shows extrapolated video results from Figs. 510.
On the other hand, the extrapolation phase has much smaller

computational complexity than the learning phase. Therefore,
total computational complexities of PDPP and HOSVD have
total order of ~ O(108) assuming the model orders shown in the
note of Table I. On the other hand, the proposed extrapolation
method has smaller total order ~ O(106) than PDPP or HOSVD,
where the extrapolation phase has higher computational cost
than the learning phase. In our method, fFT is used but a much
smaller matrix is applied than that used in PDPP. The times
Fig. 12. Comparison of 16 approximation methods (CIP, FDM, taken to move from learning to extrapolating one frame were
STAM) to advection equation in our method with and without energy averaged by using videos in extrapolation experiments: the
balancer; measure is average extrapolation error of videos in Figs. results became 80, 110, and 15 sec. for PDPP, HOSVD, and the
5-10. F., w, and eng denote Figure, with, and energy bal- proposed method, respectively (IntelCoreTM i7: 3.0 GHz).
ancer, respectively. For a guidance, a black arrow indicates Thus, it has been confirmed that our video extrapolation
HOSVD: . method outperforms the state-of-the-art non-physics based
extrapolation methods of PDPP and HOSVD in terms of
For all videos, our proposed method with fifth-order CIP and computational complexity (Table I) as well as video quality
energy balancer shown by red dotted arrows, synthesized video (Figs. 11 and 12).
sequences with the least errors. F1 and F3 closely approached
S1 and S3, respectively. Therefore, the advection equation
using CIP with fifth order approximation plays an important V. CONCLUSION
role, along with our proposed energy balancer, in minimizing We have presented a novel video extrapolation method with a
the degradation in image quality. Also, the proposed extrapo- time-varying physics-based energy optimization method. In
lation method with time-varying model parameters has higher videos with fluid-like images, dynamic textures, and moving
extrapolation accuracy than our previous prediction method rigid objects/scenes, our video extrapolation method outper-
[87] with FDM and constant model parameters (S3). forms state-of-the-art video extrapolation methods. On the
other hand, for videos with discontinuities such as sudden
F. Comparison of computational complexity changes in shape, illumination, and shadow as well as heavy
This section compares the video extrapolation performance occlusion, further enhancement of our model is needed.
of the state-of-the-art non-physics based methods of PDPP [3]
and HOSVD [4] with our proposed methods in terms of com-
putational complexity. Table I summarizes the results, where
ACKNOWLEDGMENT [27] J. Stam, Stable fluids, ACM Trans. Graph., pp. 121-128, 1999.
[28] J. Stam and E. Fimue, Turbulent wind fields for gaseous phenomena,
We would like to thank the anonymous reviewers for their ACM Trans. Graph., pp. 369-376, 1993.
valuable comments and suggestions that have helped to sig- [29] J.J. van Wijk, Image based flow visualization, ACM Trans. Graph., pp.
745-754, 2002.
nificantly improve the quality of this manuscript. [30] T. Yabe, F. Xiao, and T. Utsumi, The constrained interpolation profile
method for multiphase analysis, J. Computational Physics, 169, pp.
REFERENCES 556-593, 2001.
[31] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, High accuracy optical
[1] M. Szummer and R.W. Picard, Temporal texture modeling, in Proc. flow estimation based on a theory for warping, in Proc. Eur. Conf.
IEEE Int. Conf. Image Process., vol. 3, 1996, pp. 823-826. Comput. Vis., 2004.
[2] G. Doretto, A. Chiuso, Y. Wu, and S. Soatto, Dynamic texture, Intl J. [32] V. Kwatra, I. Essa, A. Bobick, and N. Kwatra, Texture optimization for
Comput. Vis, vol. 51, no. 2, 2003, pp. 91-109. example-based synthesis, ACM Trans. Graph., 2005.
[3] B. Ghanem and N. Ahuja, Phase based modelling of dynamic textures, in [33] M. Li and P. Vitanyi, An introduction to Kolmogorov complexity and its
Proc. IEEE Int. Conf. Comput. Vis., Jun. 2007, pp. 1-8. applications, 3rd edition, Springer Verlag, 2008.
[4] R. Costantini, L.Sbaiz, and S. Susstrunk, Higher order SVD analysis for [34] J.O. Hinze, Turbulence, McGraw-Hill, 1975.
dynamic texture synthesis, IEEE Trans. Image Process., vol. 17, no. 1, [35] S-B. Kim, Eliminating extrapolation using point distribution criteria in
pp. 42-52, Jan. 2008. scattered data interpolation, Comput. Vis. and Image Understanding, vol.
[5] L. Yuan, F. Wen, C. Liu, and H-Y. Shum, Synthesizing dynamic texture 95, issue 1, pp. 30-53, 2004.
with closed-loop linear dynamic system, in Proc. Eur. Conf. Comput. [36] J.L. Barron, D.J. Fleet, and S.S. Beauchemin, Systems and experimental
Vis., vol. 3022, 2004, pp. 603-616. performance of optical flow techniques, in Proc. Int. J. Computer Vision,
[6] J. Huang, X. Huang, and D. Metaxas, Optimization and learning for vol. 12, pp. 43-77, 1994.
registration of moving dynamic textures, in Proc. IEEE Int. Conf. [37] E. Mmin and P. Prez, Fluid motion recovery by coupling dense and
Comput. Vis. Pattern Recognit., Jun. 2007. parametric vector fields, in Proc. IEEE Conf. Comput. Vis. Pattern
[7] D. Hahjan, F-C. Huang, W. Matusik, R. Ramamoorthi, and P. Belhumeur, Recognit., Jun. 199, pp. 620-625.
Moving gradients: a path-based method for plausible image [38] R.P. Wildes, M.J. Amabile, A.M. Lanzillotto, and T. S. Leu, Recovering
interpolation, ACM Trans. Graph., vol. 28. no. 3, article 42, 2009. estimates of fluid flow from image sequence Data, Comput. Vis. Image
[8] F. Neyret, Advected textures, in Proc. SIGGRAPH/EuroGraphics Understanding, vol. 80, pp. 246-266, 2000.
Symposium on Computer Animation, 2003, pp. 147-153. [39] Y. Nakajima, H. Inomata, H. Nogawa, Y. Sato, S. Tamura, K. Okazaki,
[9] A. Bousseau, F. Neyret, J. Thollot, and D. Salesin, Video and S. Torii, Physics-based flow estimation of fluids, Pattern Recognit.,
watercolorization using bidirectional texture advection, ACM Trans. vol. 36, pp. 1203-1212, 2003.
Graph., vol. 26, no. 3, 2007, pp. 1-7. [40] L. Zhou, C. Kambhamettu, and D.B. Goldof, Fluid structure and motion
[10] M. Bertalmio, A. Bertozzi, and G. Sapiro, Navier-stokes, fluid dynamics, analysis from multi-spectrum 2D cloud image sequence, in Proc. IEEE
and image and video inpainting, in Proc. IEEE Comput. Vis. Pattern Conf. Comput. Vis. Pattern Recognit., Jun. 2000, pp. 744-751.
Recognit., Jun. 2001, pp. 1-8. [41] D. Bereziat, I. Herlin, and L. Younes, A generalized optical flow
[11] J.X. Chen, N.d.V. Lobo, C.E. Hughes, and J.M. Moshell, Real-time fluid constraint and its physical interpretation, In Proc. IEEE Comput. Vis.
simulation in a dynamic virtual environment, IEEE Computer Graphics Pattern Recognit., Jun. 2000, pp. 487-492.
and Applications, vol. 17, no. 3, pp. 52-61, May-June 1997. [42] E. Arnaud, E. Memin, R. Sosa, and G. Artana, A fluid motion estimator
[12] Y. Matsushita, E. Ofek, W.Ge, X.Tang, and H-Y. Shum, Full-frame for schlieren image velocimetry, in Proc. Eur. Conf. Comput. Vis.,
video stabilization with motion inpainting, IEEE Pattern Anal. and LNCS 3951, pp. 198-210, 2006.
Mach. Intell., Vol. 28, no. 7, pp. 1150-1163, July 2006. [43] H. Murase, Surface shape reconstruction of a nonrigid transparent object
[13] S. Chen and L. Williams, View interpolation for image synthesis, ACM using refraction and motion, IEEE Trans. Pattern Anal. Mach. Intell.,
Trans. Graph., 1993, pp. 279-288. vol. 14, no. 10, pp. 1045-1052, Oct. 1992.
[14] N. Papadakis, T. Corpetti, and E. Memin, Dynamically consistent [44] S. Negahdaripour, Revised definition of optical flow: integration of
optical flow estimation, in Proc. Intl Conf. Comput. Vis., 2007, pp. 1-8. radiometric and geometric clues for dynamic scene analysis, IEEE Trans.
[15] N. Papadakis and E. Memin, Variational optimal control technique for Pattern Anal. Mach. Intell., vol. 20, no. 9, pp. 961-979, 1998.
the tracking of deformable objects, in Proc. Int. Conf. Comput. Vis., [45] H.W. Haussecker and D. J. Fleet, Computing optical flow with physical
2007, pp. 1-8. models of brightness variation, IEEE Trans. Pattern Anal. Mach. Intell.,
[16] D.N. Metaxas, Physics-based deformable model: applications to vol. 23, no. 6, pp. 661-673, Jun. 2001.
computer vision, graphics, and medical imaging, 1st edition, 1996. [46] M.J. Black and P. Anandan, The robust estimation of multiple motions:
[17] T. Corpetti, E. Memin, and P. Perez, Dense estimation of fluid flows, parametric and piecewise-smooth flow fields, Comput. Vis. Image
IEEE Trans. Pattern Anal. and Mach. Intell., vol. 24, no.3, pp. 365-381, Understanding, vol. 63, no. 1, pp. 75-104, 1996.
Mar. 2002. [47] B. Jahne and S. Wass, Optical wave measurement technique for small
[18] A. Schdl, R. Szeliski, D.H. Salesin, and I. Essa, Video textures, in scale water surface waves, in Proc. Advances in Optical Instruments for
Proc. ACM SIGGRAPH, 2000, pp. 489-498. Remote Sensing, pp. 147-152, 1989.
[19] A.W. Fitzgibbon, Y. Wexler, and A. Zisserman, Image-based rendering [48] T.K. Holland, Application of the linear dispersion relation with respect
using image-based priors, in Proc. IEEE Int. Conf. Comput. Vis., 2003, to depth inversion and remotely sensed imagery, IEEE Trans. Geosci.
pp. 1176-1183. Remote Sens., vol. 39, no. 9, pp. 2060-2072, Sep. 2001.
[20] G. Wolberg, Digital image warping, IEEE Press, 1990. [49] L. Spencer, M. Shah, and R.K. Guha, Determining scale and sea state
[21] Y.Z. Wang, S.C. Zhu, Analysis and synthesis of textured motion: from water video, IEEE Trans. Image Process., vol. 15, no. 6, pp.
particle, wave and cartoon sketch, IEEE Patt. Pattern and Mach. Intell., 1525-1535, Jun. 2006.
vol. 26, no. 10, pp. 1348-1363, Oct. 2004. [50] J. Tessendorf, Simulating ocean water, in ACM SIGGRAPH Course
[22] A. Treuille, A. McNamara, Z. Popovic, and J. Stam, Keyframe control of Notes, 1999.
smoke simulations, ACM Trans. Graph., pp. 716-723, 2003. [51] B. Kinsman, Wind waves: their generation and propagation on the ocean
[23] M. Sun and A.D. Jepson, Video input driven animation (VIDA), in surface, Prentice-Hall, 1965.
Proc. IEEE Int. Conf. Comput. Vis., pp. 93-103, 2003. [52] K. Horikawa, An introduction to ocean engineering, Tokyo Univ. Press
[24] Y-Y. Chuang, D.B Goldman, K.C Zheng, B. Curless, D. Salesin, and R. 2004
Szeliski, Animating pictures with stochastic motion textures, ACM [53] S. Ali and M. Shah, A Lagrangian particle dynamics approach for crowd
Trans. Graph., vol.24, no.3, pp.853-860, 2005. flow segmentation and stability analysis, in Proc. IEEE Conf. Comput.
[25] Z. Lin, L. Wang, Y. Wang, Si.B. Kang, and T. Fang, High resolution Vis. Pattern Recognit., Jun. 2007, pp. 1-8.
animated scenes from stills, IEEE Trans. Vis. Comp. Graphics, vol. 13, [54] C. Yuksel, D.PH. House, and J. Keyser, Wave particles, ACM Trans.
no. 3, pp. 562-568, 2007. Graph., vol. 26, issue. 3, 2007.
[26] Z.B. Joseph, R.E. Yaniv, D. Lischinski, and M. Werman, Texture
mixing and texture movie synthesis using statistical learning, IEEE
Trans. Vis. Comp. and Graphics, vol. 7, no. 2, pp. 120-135, 2001.
13
[55] H. Wang, M. Liao, Q. Zhang, R. Yang, and G. Turk, Physically guided [80] Y. Zhang, J. Xiao, J. Hays, and P. Tan, Framebreak: dramatic image
liquid surface modeling from videos, ACM Trans. Graph., vol. 28, no. 3, extrapolation by guided shift-maps, in Proc. IEEE Conf. Comput. Vis.
article 90, 2009. Pattern Recognit., Jun. 2013, pp. 1-8.
[56] R. Vidal and A. Ravichandran, Optical flow estimation and [81] Y. Wexler, E. Shechtman, and M. Irani, Space-time completion of video,
segmentation of multiple moving dynamic textures, in Proc. IEEE Conf. IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 3, pp. 463-478, Mar.
Comput. Vis. Pattern Recognit., Jun. 2005, pp. 1-8. 2007.
[57] A.B. Chan and N. Vasconcelos, Modeling, clustering, and segmenting [82] K. He and J. Sun, Image completion approaches using the statistics of
video with mixtures of dynamic textures, IEEE Trans. Pattern Anal. similar patches, IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 12,
Mach. Intell., Vol. 30, No. 5, pp. 909-926, May 2008. pp. 2423-2435, Dec. 2014.
[58] F. Li, L. Xu, P. Guyenne, and J. Yu, Recovering fluid-type motions [83] A. Agarwala, M. Dontcheva, M. Agrawala, S. Drucker, A. Colburn, B.
using navier-stokes potential flow, in Proc. IEEE Conf. Comput. Vis. Curless, D. Salesin, and M. Cohen, Interactive digital photomontage,
Pattern Recognit., 2010, pp. 1-8. ACM Trans. Graph., vol. 23, no. 3, 2004, pp. 294-302.
[59] A. Cuzol and E. Mmin, A stochastic filtering technique for fluid flow [84] M. Bar, Visual objects in context, Nature Reviews Neuro-science, vol.
velocity fields tracking, IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, 5, no. 8, pp. 617-629, 2004.
No. 7, pp. 1278-1293, July 2009. [85] J. Hays and A. A. Efros, Scene completion using millions of
[60] D. Heitz, E. Mmin, and C. Schnrr, Variational fluid flow photographs, ACM Trans. Graph., vol. 26, no. 3, 2007.
measurements from image sequences: synopsis and perspectives, [86] H. Intraub and M. Richardson, Wide-angle memories of close-up scenes,
Experiments in Fluids, vol. 48, no. 3, pp. 369-393, 2010. J. Experimental Psychology: Learning, Memory, and Cognition, vol. 15,
[61] B. K. P. Horn and B. G. Schunck, Determining optical flow, Artificial no. 2, Mar. 1989.
Intell., vol. 17, pp. 185-204, 1981. [87] H. Sakaino, Spatio-temporal image pattern prediction method based on a
[62] T. Corpetti and E. Memin, Stochastic uncertainty models for luminance physical model with time-varying optical flow, IEEE Trans. Geosci. and
consistency assumption, IEEE Trans. Image Process., vol. 21, no. 2, pp. Remote Sens., vol. 51, no. 5, pp. 3023-3036, May. 2013.
481-493, Feb. 2012. [88] S. Fazekas, T. Amiaz, D. Chetverikov, and N. Kiryati, Dynamic texture
[63] P. Heas, C. Herzet, and E. Memin, Bayesian inference of models and detection based on motion analysis, Int. J. Comput. Vis., vol. 82, pp.
hyperparameters for robust optical-flow estimation, IEEE Trans. Image 48-63, 2009.
Process., vol. 21, no. 4, pp. 1437-1451, Apr. 2012. [89] M. Griebel, T. Dornseifer, and T. Neunhoeffer, Numerical simulation in
[64] F. Becker, B. Wieneke, S. Petra, A. Schroder, and C. Schnorr, fluid dynamics: a practical introduction (monographs on mathematical
Variational adaptive correlation method for flow estimation, IEEE modeling and computation), Soc. for Industrial and Applied
Trans. Image Process., vol. 21, no. 6, pp. 3053-3065, June 2012. Mathematics, Dec. 1997..
[65] A. Doshi and A.G. Bors, Robust processing of optical flow of fluids,
IEEE Trans. Image Process., vol. 19, no. 9, pp. 2332-2344, Sept. 2010.
Hidetomo Sakaino (M11-SM16) received the B.S.
[66] Y. Yamashita, T. Harada, and Y. Kuniyoshi, Causal flow, IEEE Trans.
and M.S. degrees in nuclear engineering and
Multimedia, vol. 14, no. 3, pp. 619-629, June 2012.
biomedical engineering from Hokkaido University,
[67] N. Ray, Computation of fluid and particle motion from a time-sequenced
Japan, in 1986 and 1988, respectively, and the Ph.D.
image pair: a global outlier identification approach, IEEE Trans. Image degree in engineering from the University of Tokyo,
Process., vol. 20, no. 10, pp. 2925-2936, Oct. 2011. Japan, in 2011. In 1988, he joined Human Interface
[68] P. Heas, E. Memin, N. Papadakis, and A. Szantai, Layered estimation of Labs. of Nippon Telegraph and Telephone (NTT)
atmospheric mesoscale dynamics from satellite imagery, IEEE Trans. corporation. In 1990, he also joined Advanced
Geosci. Remote Sens., vol. 45, no. 12, pp. 4087-4104, Dec. 2007. Telecommunications Research Institute International.
[69] M. Thomas, C. Kambhamettu, and C. A. Geiger, Motion tracking of In 2001 and 2006, he joined NTT Communication
discontinuous sea ice, IEEE Trans. Geosci. Remote Sens., vol. 49, no. 12, Science Labs and NTT Energy and Environment Systems Labs, respectively.
pp. 5064-5079, Dec. 2011. Now, he is at NTT Network Technology Labs. His main research interests
[70] L. Xu, J. Jia, and Y. Matsushita, Motion detail preserving optical flow include computer vision, image processing, nonlinear signal processing, and
estimation, IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, No. 9, pp. energy optimization especially in the understanding of real world changes. His
1744-1757, Sept. 2012. research areas include physics-based modeling from optical flow, image
[71] P. Heas, C. Herzet, E. Memin, D. Heitz, and P.D. Mininni, Bayesian synthesis, image prediction, to stochastic tracking. His computer vision based
estimation of turbulent motion, IEEE Trans. Pattern Anal. Mach. Intell., weather-radar image prediction method has been launched commercially in
vol. 35, no. 6, pp. 1343-1356, June 2013. Japan. He has published over 37 papers in peer-reviewed journals and
[72] C. Li, D. Pickup, T. Saunders, D. Cosker, D. Marshall, P. Hall, and P. conferences, and holds over 150 issued patents including international patents
Willis, Water surface modeling from a single viewpoint video, IEEE in USA, Germany, and France. He is a senior member of the IEEE and the IEEE
Trans. Vis. Comp. Graph., vol. 19, No. 7, pp. 1242-1251, July 2013. Multimedia Communication Society Multimedia Communications Technical
[73] J. Chen, G. Zhao, M. Salo, E. Rahtu, and M. Pietikainen, Automatic Committee.
dynamic texture segmentation using local descriptors and optical flow,
IEEE Trans. Image Process., vol. 22, no. 1, pp. 326-339, Jan. 2013.
[74] C. Cassia, S. Simoens, V. Prinet, and L. Shao, Sub-grid physical optical
flow for remote sensing of sandstorm, in Proc. IEEE Int. Geosci. Remote
Sens. Symposium, July 2010, pp. 2230-2233.
[75] C. Roth and M. J. Black, On the spatial statistics of optical flow, in Proc.
10th IEEE Conf. Comput. Vis., 2005, pp. 42-49.
[76] D. Sun, S. Roth and M. J. Black, Secrets of optical flow estimation and
their principles, in Proc. 12th IEEE Conf. Comput. Vis. Pattern
Recognit., Jun. 2010.
[77] S. Baker, D. Scharstein, J.P. Lewis, S. Roth, M.J. Black, and R. Szeliski,
A database and evaluation methodology for optical flow, Int. J. Comput.
Vis., vol. 92, pp. 1-31, 2011.
[78] H. Sakaino, Motion estimation for dynamic texture videos based on
locally and globally varying models, IEEE Trans. Image Process., vol.
24, no. 11, pp. 3609-3623, Nov. 2015.
[79] T. Brox and J. Malik, Large displacement optical flow: descriptor
matching in variational motion estimation, IEEE Trans. Pattern Anal.
Mach. Intell., vol. 33, no. 3, pp. 500-513, Mar. 2011.

Video bp1

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Video bp1

Hochgeladen von

Copyright:

Verfügbare Formate

This article has been accepted for publication in a future issue of this journal, but has not been

Video Extrapolation Method Based on

rigid objects, various new videos are generated by changing the

VIDEO extrapolation is the synthesis of new image se-

constancy constraint [36], i.e. dI =0 . Applying the Taylor=

and vortexes are generated over time. Given current velocity u , n

(a) (b) (c)

IV. EXPERIMENTS AND DISCUSSION

Figure 9 shows a Rubik cube rotating on a stage against a

Fig. 11. Comparison of 8 video extrapolation methods in terms of

On the other hand, the extrapolation phase has much smaller

Das könnte Ihnen auch gefallen