Beruflich Dokumente
Kultur Dokumente
Distributed Arithmetic
F5
F4
F7
F6
Figure 1. Eight Butterflies for the DCT
Figure 3. Implemented 8 x 1 DCT architecture. In
4. Hardware implementation the second stage the datapath remains the same
with changed bit sizes for the adders and latches
This section discusses the hardware implementation
of the CoDA architecture. A block diagram of the The output is moved to the transpose buffer, where
architecture is shown in figure 2. we have to wait for eight clock cycles for one row to
fill after which the transposition begins. In the second
part of the DCT ( the second 8 x 1 DCT to complete
the column transform ), the datapath structure is the
same as in figure 3. However the adders and latches are
different. In stage one there are 12 14-bit adders
followed by 15-bit latches. Stage two has 22 15-bit
adders followed by 16-bit latches. The 8 4:2
compressor trees produce 12-bit outputs which are
finally added together to produce the final 8 x 8
Figure 2. Core processor block diagram transform.
The inputs for the first 1D DCT module are eight 9- 5. Simulation and Results
bit vectors for one row of the macroblock. After the 1D
DCT, the output word is 14bits which is passed on to The CoDA 8x8 2D DCT architecture was modeled
the 8 x 8 word transposition RAM where the output using verilog and simulated in ModelSim. Code for the
vector from the 1D DCT is transposed. After the architecture was also written in MATLAB. Binary
second stage of applying the DCT (which completes images were used in MATLAB as test input vectors.
the 2D DCT) we get the 12-bit outputs for each of the The same test vectors were used to test the synthesized
eight input vectors. design and the fabricated chip. The first
7. Acknowledgements
The authors acknowledge the support of the United
States Department of Energy (DoE) EETAPP program
(DE-97ER12220) and the Governor’s Information
Technology Initiative. We would like to thank MOSIS
for fabricating our chip.
8. References
[1] Jiun-In Guo, Rei-Chin Ju, Jia-Wei Chen, “An efficient 2-
D DCT/IDCT core design using cyclic convolution and
adder-based realizationArticle Title” IEEE Trans. CAS for
Video Technology, Vol 14, Issue 4, pp. 416-428, April 2004.