Sie sind auf Seite 1von 4


Motion estimation techniques

in video processing
By Milind Phadtare
System Architect, Video
NXP Semiconductors India Pvt. Ltd

Motion estimation techniques

form the core of video com-
pression and video processing
applications. Motion estimation
extracts motion information from
the video sequence. The motion
is typically represented using a
motion vector (x,y). The motion
vector indicates the displacement
of a pixel or a pixel block from the
current location due to motion.
Motion information is used in
video compression to find best Figure 1: Block matching algorithm.
matching block in reference
frame to calculate low energy in figure 1. The current frame other criteria also available such compression. But, the required
residue, used in scan rate con- is divided into pixel blocks and as cross correlation, maximum computations are prohibitively
version to generate temporally motion estimation is performed matching pixel count etc. high due to the large amount of
interpolated frames. It is also used independently for each pixel The reference pixel blocks are candidates to evaluate. The num-
in applications such motion block. Motion estimation is done generated only from a region ber of candidates to evaluate are
compensated de-interlacing, by identifying a pixel block from known as the search area. Search (2Sx+1)*(2Sy+1). Hence, full search
video stabilisation, motion track- the reference frame that best region defines the boundary for is typically not used. Also, it does
ing etc. matches the current block, whose the motion vectors and limits the not guarantee consistent motion
Varieties of motion estimation motion is being estimated. The number of blocks to evaluate. The vectors required for video process-
techniques are available. There reference pixel block is generated height and width of the search ing applications.
are pel-recursive techniques, by displacement from the current region is dependant on the There are several other fast
which derive motion vector for block’s location in the reference motion in video sequence. The block-matching algorithms,
each pixel. There is the phase frame. The displacement is pro- available computing power also which reduce the number of eval-
plane correlation technique, vided by the Motion Vector (MV). determines the search range. Big- uated candidates yet try to keep
which generates motion vectors MV consists of is a pair (x, y) of ger search region requires more a good block matching accuracy.
via correlation between current horizontal and vertical displace- computation due to increase in Note that since these algorithms
frame and reference frame. 2D ment values. number of evaluated candidates. test only limited candidates,
DFT is used for this. However, the There are various criteria avail- Typically the search region is they might result in selecting a
most popular technique is Block able for calculating block match- kept wider (i.e. width is more candidate corresponding to local
Matching Algorithm. This article ing. Two popular criteria are listed than height) since many video minima, unlike full search, which
mainly discusses this technique. bellow sequences often exhibit pan- always results in global minima.
Sum of Square Error (SSE) = ning motion. The search region Some of the algorithms are
Block Matching Algorithm can also be changed adaptively listed bellow.
Block Matching Algorithm (BMA) is depending upon the detected
the most popular motion estima- motion. The horizontal and verti- Three step search
tion algorithm. Sum of Absolute Difference cal search range, Sx & Sy, define In a three-step search (TSS) algo-
BMA calculates motion vec- (SAD) = the search area (+/-Sx and +/- Sy) rithm, the first iteration evaluates
tor for an entire block of pixels as illustrated in figure 1. nine candidates as shown in
instead of individual pixels. The figure 2. The candidates are cen-
same motion vector is applicable Full search block matching tred around the current block’s
to all the pixels in the block. This SSE provides a more accu- Full search block matching algo- position. The step size for the fist
reduces computational require- rate block matching, however rithm evaluates every possible iteration is typically set to half the
ment and also results in a more requires more computations. pixel block in the search region. search range. During the next it-
accurate motion vector since SAD provides fairly good match Hence, it can generate the best eration, the search centre is shifted
the objects are typically a cluster at lower computational require- block matching motion vec- to the best matching candidate
of pixels. ment. Hence it is widely used for tor. This type of BMA can give from the first iteration. Also, the
BMA algorithm is illustrated block matching. There are various least possible residue for video step size is reduced by half. The

Electronic Engineering Times India | August 2007 | 

selected for the current block. The
number of evaluated candidate
is variable for the 2D logarithmic
search. However, the worst case
and best case candidates can be

One at a Time search

The one at a time search algorithm
estimates the x-component and
the y-component of the motion
vector independently. The can-
didate search is first performed
along the x-axis. During each itera-
tion, a set of three neighbouring
candidates along the x-axis are
tested. The three-candidate set is
shifted towards the best matching
candidate, with the best matching
candidate forming the centre of
the set for the next iteration. The
process stops if the best match-
ing candidate happens to be the
centre of the candidate set.
The location of this candidate
on the x-axis is used as the x-com-
ponent of the motion vector. The
search now continues parallel to
the y-axis. A procedure similar to
x-axis search is followed to esti-
mate y-component of the motion
Figure 2: Fast block matching algorithms. vector. One-step at a time search
on average tests less number of
same process continues till the the second iteration, the centre iteration. Hence, there is no need candidates. However, the motion
step size becomes equal to one of the diamond is shifted to the for block matching calculation for vector accuracy is poor.
pixel. This is the last iteration of best matching candidate. The these candidates during the sec- The required computations
the three-step search algorithm. step size is reduced by half only if ond iteration. The results from the are significantly reduced due
The best matching candidate from the best candidate happens to be first iteration can be used for these to the above fast algorithms.
this iteration is selected as the the centre of the diamond. If the candidates. The process continues However, for higher resolutions,
final candidate. The motion vector best candidate is not the diamond till the step size becomes equal it might still be necessary to fur-
corresponding to this candidate is centre, same step size is used even to one pixel. For this iteration all ther reduce computations to fit
selected for the current block. The for second iteration. In this case, eight surrounding candidates real time requirements. Typically
number of candidates evaluated some of the diamond candidates are evaluated. The best matching this is achieved by using sub-
during three-step search is very are already evaluated during first candidate from this iteration is sampled SAD calculation and
less compared to the full search
algorithm. The number of evalu-
ated candidate is fixed depending
upon the step size set during the
first iteration.

2D logarithmic search
2D Logarithmic search is an-
other algorithm, which tests lim-
ited candidates. It is similar to the
three-step search. During the first
iteration, a total of five candidates
are tested. The candidates are
centred around the current block
location in a diamond shape. The
step size for first iteration is set
equal to half the search range. For Figure 3: Sub-pixel motion estimation.

 Electronic Engineering Times India | August 2007 |

stabilisation, it is more useful to
find global motion rather than
local motion. In global motion,
the same type of motion is ap-
plicable to each pixel in the video
frame. Some examples of global
motion are panning, tilting and
zoom in/out. In all these motion,
each pixel is moving using the
same global motion model. The
motion vectors for each pixel or
pixel block can be described using
following parametric model with
four parameters

Global motion vector for a

pixel or pixel block is given a

For pan and tilt global motion,

Figure 4: Hierarchical block matching algorithm. only q0 and q1 are non-zero i.e.
constant motion vector for the
skipping alternate blocks for mo- sub-pixel grid. The sub-pixel grid the highest resolution is reduced entire video frame. For pure zoom
tion estimation. For sub-sampled can be either at half pixel resolu- to (N/2)x(N/2) in the next resolu- in/out, only p0 and p1 will be non-
SAD calculation only alternate tion or quarter pixel resolution. tion level. Similarly, the search zero. However a combination of all
samples in vertical and horizon- The reference block at sub-pixel range is also reduced. The mo- the parameters is usually present.
tal direction are used. It reduces grid is generated using either tion estimation process starts at Global motion estimation involves
the SAD computation by ¼. The bi-linear interpolation or a more the lowest resolution. Typically, calculation of the four parameters
motion estimation can be per- sophisticated six-tap filter as used full search motion estimation is in the model (p0,p1,q0,q1). The
formed only for alternate blocks. in H.264 standard. Normally, the performed for each block at the parameters can be calculated by
The motion vectors for missing integer motion vector is further lowest resolution. Since the block treating them as four unknowns.
blocks can be generated via inter- refined to first half pixel by testing size and the search range are Hence, ideally sample motion
polation from the neighbouring the eight neighbouring half pixel reduced, it does not require large vectors at four different locations
block’s motion vectors. locations and further to quarter computations. The motion vectors can be used to calculate the four
The choice of block size also pixel by testing the neighbouring from lowest resolution are scaled unknown parameters. In prac-
determines the trade off between eight quarter pixel locations. This and passed on as candidate mo- tice though, more processing is
required computation and accu- is illustrated in figure 3. tion vectors for each block to next needed to get good estimate for
racy of motion vectors. Smaller level. At the next level, the motion the parameters. Also, note that still
block size can accurately describe Hierarchical block matching vectors are refined with a smaller local motion estimation, at least
motion of even smaller objects Hierarchical Block Matching Al- search area. A simpler motion at four locations, is essential to
but will need higher computa- gorithm is a more sophisticated estimation algorithm and a small calculate the global motion estima-
tion. E.g. H.264 video coding stan- motion estimation technique. Hi- search range is enough at close tion parameters. However, there
dard allows a block size of 4x4. erarchical Block Matching motion to highest resolution since the are algorithms for global motion
Typical block sizes are 16x16, 8x8 estimation provides consistent motion vectors are already close estimation, which do not rely on
and 4x4. Note that a block size motion vectors by successively to accurate motion vectors. local motion estimation. The above
lower than 4x4 may not contain refining the motion vector at dif- parametric model with four param-
sufficient texture, hence, it may ferent resolutions. In Hierarchical Global motion estimation eters cannot fit rotational global
not be possible to accurately motion estimation, a pyramid of There is another type of motion motion. For rotational motion a
predict block matching. reduced resolution video frame is estimation technique know as six-parameter model is needed.
formed. The original video frame global motion estimation. The However, the same four-parameter
Sub-pixel motion estimation forms the highest resolution im- motion estimation techniques model concepts can be extended
The actual motion in the video se- age and the other images in the discussed so far are useful in es- to the six-parameter model.
quence can be much finer. Hence, pyramid are formed by down timating local motion (i.e. motion
the resulting object might not lie sampling the original image. A of objects within the video frame). True motion estimation
on the integer pixel grid. To get a simple bi-linear down sampling However, the video sequence can For video compression applica-
better match, the motion estima- can be used. This is illustrated in also contain global motion. For tions it is enough to get a motion
tion needs to be performed on a figure 4. The block size of NxN at some applications, such as video vector corresponding to best

Electronic Engineering Times India | August 2007 | 

consists of only spatial & temporal
neighbouring motion vectors. This
results in a very consistent motion
vector field giving true motion. To
kick-start the algorithm a random
motion vector is also used as a
candidate. This is illustrated in
figure 5.

Backward motion estimation

So far the reference frame is as-
sumed to be from the past and the
motion estimation is performed in
forward direction. However, the
motion estimation can also be
performed in backward direction
using a future frame as a reference.
This is useful in video sequences
where new objects enter the
scene or background is being
uncovered. In both the cases the
matching content is available only
in the future video frames. The oc-
clusion problem can be avoided
by using both forward and back-
ward motion estimation.

Object-based estimation
Finally, the object based motion
Figure 5: Three dimensional recursive search (3DRS) algorithm. estimation provides the most
advanced motion estimation.
match. This in turns results in best possible match. True motion – objects are larger than block Here, the video sequence is seg-
lower residual energy and better estimation can be achieved via size and objects have inertia. The mented into objects instead of
compression. However, for video both post-processing the motion first assumption suggests that pixel blocks. Motion of each object
processing applications, especially vectors to get smooth motion the neighbouring block’s motion is independently estimated. This
for scan rate conversion, true mo- vector field as well as building the vectors can be used as candidates provides the best possible true
tion estimation is desired. In True consistency measures in motion for the current block. However, motion estimation helping both
Motion estimation, the motion estimation algorithm itself. Three for neighbouring blocks ahead video compression and scan
vectors should represent true Dimensional Recursive Search in raster scan, there is no motion rate conversion. However, object
motion of the objects in the video (3DRS) is one such algorithm vectors calculated yet. Here, the segmentation remains the most
sequence rather than providing where the consistency assump- second assumption is applied complex processing step, which
best block match. Hence, it is tion is inbuilt into the motion and motion vectors from previous caused the object based motion
important to achieve a consistent estimation. The algorithm works frame are for these blocks. 3DRS estimation to be still out of most
motion vector field rather that on two important assumptions motion estimator’s candidate set of the consumer devices.

 Electronic Engineering Times India | August 2007 |