Sie sind auf Seite 1von 34

MPEG

Compression
MPEG-2 and MPEG-4
MPEG 2
Video Compression
Bandwidth is precious!
MPEG compression helps get the most out of available bandwidth
A trade off between amount of data to be sent and acceptable picture
quality
Uncompressed high-definition pictures take too much bandwidth
to send down a 6MHz or 8MHz cable channel (up to 40Mbps)
1920 x 1080 24-bit pixels @ 30 frames per second = 1.49Gbps!
Storage of content is also much more efficient with video
compression
Definition Lines/Frame Pixels/Line Aspect Ratios Frame Rates
High (HD) 1080 1920 16:9 23.976p, 24p, 29.97p, 29.97i, 30p, 30i
High (HD) 720 1280 16:9 23.976p, 24p, 29.97p 30p, 59.94p, 60p
Standard (SD) 480 704 4:3, 16:9 23.976p, 24p, 29.97p, 29.97i, 30p, 30i, 59.94p, 60p
Standard (SD) 480 640 16:9 23.976p, 24p, 29.97p, 29.97i, 30p, 30i, 59.94p, 60p
ATSC Table III
MPEG 3
MPEG Applications
5 Kbps
100 Kbps 1 Mbps
10 Mbps
Video Phones
GSM/3G Phones
Internet Video
Streaming Apps
Digital Cinema
Studio Profile
100 Mbps 1 Gbps
Xilinx Solution
Partial MPEG-4
H/W acceleration
Peripherals (IrDA,
UART, I2C, SPI)
Memory Interfaces
(SRAM, SDRAM,
Flash)
Xilinx Solution
Peripherals (IrDA,
UART, I2C, SPI)
Memory Interfaces
(SRAM, SDRAM,
Flash)
Xilinx Solution
Full/Partial H/W
acceleration
High performance
parallel DSP
algorithms
Multiple processor
instantiations (PPC,
uBlaze)
Custom logic
Xilinx Solution
Full H/W acceleration
Ultra High
performance parallel
DSP algorithms
Multiple processor
instantiations (PPC,
uBlaze)
Custom logic
MPEG 4
HDTV and Bandwidth
HDTV may eventually prove popular with consumers
HDTV might even be legislated as must carry in the US
Operators face adding channel capacity
Could mean rebuilding entire facility!
Much rather use better compression techniques
Squeeze more down the same coax/fibre
Relatively inexpensive to implement
Also concern about cost of providing high-definition
capability at customer premise
Ship out new set-top boxes or upgrade in the field?
MPEG 5
MPEG Compression
Spatial Processing
Uses DCT within a single picture
to enable removal of high frequencies
not discernable to human eye
Temporal Processing
Seeking out and removing redundancy
between successive images/frames
Variable Length Coding (VLC)
Use shortest codes for most common samples
Run Length Encoding (RLE)
Replace long strings of zeros with single command code
MPEG 6
Spatial Redundancy
DCT
Returns the discrete cosine transform of video/audio input
Can be referred to as the even part of the Fourier series
Converts an image or audio block into its equivalent frequency
coefficients
IDCT
Inverse of the DCT function
IDCT reconstructs a sequence from its discrete cosine
transform (DCT) coefficients
MPEG 7
DCT in MPEG Compression
Spatial Frequency
S
e
n
s
i
t
i
v
i
t
y
Scan macroblock
in 8x8 blocks
Determine luma &
chromapixel values
Luma samples shown here
(Chroma processed
seperately)
Convert to frequency
components (DCT)
Human eye less sensitive to
high frequencies
Compress using zigzag scan
and run length encoding for
zero values (in blue)
Further compression with
Huffman encoding (VLC)
Output DCT coefficients Quantize higher frequencies
with less bits (weighting)
Zero values for frequencies
below perception threshold
2
2
3
3
4
4
5
5
6
6
7
7
8
8
1
1
Scan picture using 16x16
pixel macroblock
MPEG 8
I B B P B B P B B I B B P B B P B B I
GOP (Group Of Pictures)
Temporal Redundancy
I - Intra coded (spatially coded) pictures
Forms the anchor for a GOP
P - Forward Predicted pictures
Predicted from previous I or P pictures
P picture made up of vectors showing where to get pixel data from in previous pictures and/or
values that must be added to previous picture to get current pixel value
B - Bidirectional Predicted pictures
Predicted from previous or later I or P pictures (never from other B pictures)
Made up of vectors showing where to get pixel data from in previous pictures
MPEG 9
Picture Difference
Difference between successive pictures easy to calculate
using subtractor
Picture difference can also be spatially compressed
DCT, VLC, RLE etc. as before
Current
Picture
Current
Picture
Previous
Picture
Previous
Picture
Delay Buffer
Delay Buffer
Picture
Difference
Picture
Difference
+
-
MPEG 10
DCT
DCT
Inverse DCT
Inverse DCT
Quantization
Quantization
Inverse
Quantization
Inverse
Quantization
Frame
Memory
Frame
Memory
Motion
Compensation
Motion
Compensation
Motion
Estimation
Motion
Estimation
Encoding
Encoding
MPEG-2 Encoder
Hardware acceleration
particularly suited to most
complex parts of MPEG-2
algorithm
DCT/IDCT
Motion Estimation/
Compensation
Acceleration techniques
similar for MPEG-4 or other
compression schemes
MPEG 11
MPEG-4 Encoder
DCT
DCT
Inverse DCT
Inverse DCT
Quantization
Quantization
Inverse
Quantization
Inverse
Quantization
Frame
Memory
Frame
Memory
Motion
Compensation
Motion
Compensation
Motion
Estimation
Motion
Estimation
Encoding
Encoding
Shape Coding
Shape Coding
Multiplex
Multiplex
MPEG 12
Bogging down the processor
Processors excellent for sequential and control tasks
Implementing MPEG-4 compression algorithms can very
easily bog down a traditional processor
Over 50% of processor cycles may be spent evaluating one
block of an algorithm (motion estimation, etc.)
Necessitates the need for dedicated H/W acceleration
FPGAs are ideal
Processor vendors have bolted on dedicate DSP blocks
for H/W acceleration
Inflexible and still performance limited
Not suitable for Studio and Digital Cinema applications
MPEG 13
Offloading MPEG Algorithms
Why?
Saves valuable processor cycles
Increased quality and performance
Potential system cost savings
Ability to add more capabilities (codecs) to the system
Which portions prime targets?
Motion estimation, Motion Compensation, DCT/IDCT,
Color Space Conversion
MPEG 14
Motion Estimation
Estimation predicts next picture by shifting data from previous picture along a
calculated motion vector
In encoder, predicted picture is compared to actual picture and any prediction
errors calculated
Transmitting motion vectors and prediction errors takes much less bandwidth
than coding entire picture
Part of image that is common
between frames but has moved
Original position of picture
segment in previous frame
Mption vector sent with
any prediction errors
MPEG 15
Calculating Motion
Estimation Requirements
For CIF (352x288) resolutions
Each block has 16x16 = 256 pixels
16x16 = 256 search positions
Each frame has 396 (352/16 * 288/16) blocks
20 frames per second
256 x 256 x 396 x 20 = 778,567,680 calculations
Requires 779MHz general-purpose processor that can perform
addition and subtraction in one clock cycle!
Ideal for implementation inside FPGA
MPEG 16
Motion Compensation
Process of compensating for the displacement of moving
objects from one frame to the next
The use of motion vectors to improve the efficiency of the
prediction of pel values. The prediction uses motion
vectors to provide offsets into the past and/or future
reference pictures containing previously decodedpel
values that are used to form the prediction error signal.
Replaces a picture or portion of a picture based on displaced
pelsof a previously transmitted frame in an image sequence
MPEG 17
2-D DCT Operation
1
1
-
-
D DCT on Rows
D DCT on Rows
1
1
-
-
D DCT on Columns
D DCT on Columns
Application Note and Reference Design Available
http://www.xilinx.com/xapp/xapp610.pdf
MPEG 18
2-D IDCT Operation
Application Note and Reference Design Available
http://www.xilinx.com/xapp/xapp611.pdf
1
1
-
-
D IDCT on Rows
D IDCT on Rows
1
1
-
-
D IDCT on Columns
D IDCT on Columns
MPEG 19
2-D DCT/IDCT Utilisation
2-D DCT
2-D IDCT
MPEG 20
SRAM
SDRAM
32-bit
Embedded
CPU
P
r
o
c
e
s
s
o
r

I
n
t
e
r
f
a
c
e
Memory Controller
Other Custom Logic
Partial MPEG H/W
Acceleration
DCT / IDCT
Motion Compensation
Motion Estimation
Applications
Web Tablets
Internet Appliances
Telematics
High-end PDAs
Dedicated
H/W
acceleration
blocks
MPEG 21
Customized MPEG
Implementation
SRAM
SDRAM
P
r
o
c
e
s
s
o
r

I
n
t
e
r
f
a
c
e
Memory Controller
Other Peripherals
DCT / IDCT
Motion Compensation
Motion Estimation
32-bit Soft-CPU
up to 100 D-MIPS
32-bit Soft-CPU
up to 100 D-MIPS
.
.
.
Multiple-
processor
instantiations
Resolution,
frame rate,
profile, level &
QoRdependent
Dedicated
H/W
acceleration
blocks
Applications
Digital TV
Plasma Displays
LCD Displays
Set-top Boxes
MPEG 22
High Performance
MPEG Applications
SRAM
SDRAM
P
r
o
c
e
s
s
o
r

I
n
t
e
r
f
a
c
e
Memory Controller
Other Peripherals
DCT / IDCT
Motion Compensation
Motion Estimation
32-bit Soft-CPU
up to 100 D-MIPS
Up to 4
PowerPC and
Multiple
MicroBlaze
instantiations
Resolution,
frame rate,
profile, level &
QoRdependent
Pipelined
and Parallel
Dedicated
H/W
acceleration
blocks
32-bit Hard CPU
420 D-MIPS
.
.
.
.
.
.
Applications
Studio Applications
Digital Cinema
MPEG 23
MPEG IP for Xilinx FPGAs
IP Core or Reference Design
XAPP610 Video Compression using DCT
XAPP611 Video Compression using IDCT
XAPP208 IDCT Implementation in Virtex
2-D DCT
1-D DCT
2-D DCT
2-D DCT/IDCT
2-D IDCT
2-D DCT
2-D DCT/IDCT
DCT/IDCT
MPEG-2 SDTV I & P Encoder
MPEG-2 HDTV I & P Encoder
Provider
Xilinx Inc.
Xilinx Inc.
Xilinx Inc.
Xilinx Inc.
Xilinx Inc.
eInfochips
BARCO-SILEX
CAST Inc.
CAST Inc.
CAST Inc.
TILAB
DumaVideo
DumaVideo
See http://www.xilinx.com/ipcenter for more details
See http://www.xilinx.com/ipcenter for more details
MPEG 24
FPGA Performance Advantage
Flexible architecture
Distributed DSP resources (LUT,
registers, multipliers, & memory)
Parallel processing maximizes
data throughput
Support any level of parallelism
Optimal performance/cost tradeoff
All 256 MAC operations
in 1 clock cycle
FPGA
....
C0
Data Out
C1 C2 C255
Reg0
Reg1 Reg2 Reg255
Data In
Example
256 Tap FIR Filter = 256 multiply and accumulate
(MAC) operations per data sample
MPEG 25
Loop
Algorithm
256 times
Performance Limitation of
Conventional DSP
Fixed inflexible architecture
Typically 1-4 MAC units
Fixed data width
Serial processing limits data
throughput
Time-shared MAC unit
High clock frequency creates
difficult system-challenge
Conventional DSP Device
(Von Neumann architecture, or
extensions thereof)
Example
256 Tap FIR Filter = 256 multiply and accumulate
(MAC) operations per data sample
Data Out
Reg
Data In
MAC unit
MPEG 26
Unrivalled DSP Performance
TeraMAC/s via FPGA and Embedded Multiplier fabric for:
Multimedia Compression - MPEG2, MPEG4, H.26L, MJPEG, JPEG2000
Video Processing - Integrated Line Buffers, Enhancement, Pattern Recognition, Noise
Reduction, Resizing, Rotation, Scalability
Convergence of emerging technologies in Multimedia over IP & wireless
For Standard Definition Pixel Rates (13.5 MHz pixels)
SDTV Test equipment, Broadcast test equipment, Studio effects equipment, scan rate
converters, frame rate converters, MPEG-2 codecs
For High Definition Pixel Rates or Multiple Channels of Standard
Definition (74.25 MHz pixels)
HDTV Test equipment, Broadcast test equipment, Home Theatre
projection devices, Advanced studio effects, Conversions from SDTV,
MPEG-2 4:2:2 profile codecs
XtremeDSP for Video
MPEG 27
Traditional Processing
Control Tasks
Control Tasks
FIR Filter
C++ Code Stack
Control Tasks
FIR Filter
Math-intensive algorithms dominate the
processing capacity
CPU
CPU
RAM
RAM
FIR Filter FIR Filter
Processing time
Traditional
MPEG 28
Xtreme Processing
Control Tasks
Control Tasks
FIR Filter
C++ Code Stack
Control Tasks
FIR Filter
FIR Filter FIR Filter
Processing time
Traditional
PowerPC
Processor
PowerPC
Processor
XTREME
Processing
FIR Engine
(fabric/multipliers)
OCM
RAM
OCM
RAM
3
+
2
+
0 1
+
n
+
PowerPC with Application-Specific
Hardware Acceleration
The Virtex-II Pro Advantage
MPEG 29
Chroma Downsampling
Most MPEG-2 applications use 8-bit 4:2:0 sampling
But incoming data usually 10-bit 4:2:2 video
Maybe via SDI (Serial Digital Interface) for example
Conversion therefore needed before MPEG processing
This chroma downsampling is lossyform of data compression
4:2:2
to
4:2:0
4:2:2
to
4:2:0
DCT
DCT
Quantize
Coefficents
Quantize
Coefficents
Zig Zag
(Run
Length)
Encoding
Zig Zag
(Run
Length)
Encoding
Huffman
(Variable
Length)
Encoding
Huffman
(Variable
Length)
Encoding
10
8
SDI
Compressed
Video Out
Pixel Data In
MPEG Processing
Chroma
Downsampling
MPEG 30
4:2:2 and 4:2:0 Sampling
16x16 Macroblock Luma (Y) Sample Points Chroma(CrCb) Sample Points
4:2:2
4:2:0
Only interpolated Cr Cbvalues used
Only one horizontal Cr Cbvalue used for every Y
-
S
a
m
p
l
e
d

V
a
l
u
e
-
V
a
l
u
e

u
s
e
d

f
o
r

I
n
t
e
r
p
o
l
a
t
i
o
n
Easy in an FPGA!
Easy in an FPGA!
MPEG 31
FIR Filters for Xilinx FPGAS
IP Core or Reference Design
XAPP219 Transposed Form FIR Filters
MAC FIR
Serial Distributed Arithmetic FIR Filter
Parallel Distributed Arithmetic FIR Filter
Distributed Arithmetic FIR Filter
Provider
Xilinx Inc.
Xilinx Inc.
Xilinx Inc.
Xilinx Inc.
Xilinx Inc.
See http://www.xilinx.com/ipcenter for more details
See http://www.xilinx.com/ipcenter for more details
MPEG 32
Xilinx MPEG Encoder Solutions
Allow offload of complex processing to hardware
Leave host processor to manage the system
Increase system performance of ASSP based design
Software incapable of supporting smooth, high-quality, full-screen video
streaming
Hardware acceleration becomes a necessity!
MPEG sub-blocks, filters, image quality analysis
Support for multiple channels
ASSPssupport one or two channels at most
FPGAs can support many more and decrease overall system cost
Lower bill of materials, easier system management
Wide range of network interfaces supported
Flexibility, scalability, ease of integration
MPEG 33
Differentiation Around a
Mature Standard
FPGA implementation enables differentiation of product
MPEG really only definesbitstreamsyntax
Difficult to add value to totally ASSP-based algorithms
Proprietary compression and/or image improvements possible whilst still
conforming to standard
IP available for time-to-market advantage
Reprogrammable platform such as Virtex-II Pro ideal for
compression research
Hidden costs associated with ASIC development (e.g. NRE, risk)
not a factor with Xilinx FPGAs
Results in lower overall costs for production
Questions?
espteam@xilinx.com

Das könnte Ihnen auch gefallen