Sie sind auf Seite 1von 9

Y.

Yang: Three-dimensional Image Processing VLSI System with Network-on-chip System and Reconfigurable Memory Architecture

1345

Three-dimensional Image Processing VLSI System with Networkon-chip System and Reconfigurable Memory Architecture
Yun Yang, Member, IEEE
Abstract In this paper, we propose new RAM/ROM module system with reconfigurable memory architecture for three-dimensional (3D) image processing VLSI system. To enable flexible image data processing, suitable input/output data control is critical feature for high performance image processing system. The fast speed 3D VLSI system also requires efficient pipeline data operation. New RAM/ROM synthesis design system is realized by specific arrangement with RAM, ROM, pin and interconnection. The pipeline FlipFlop control, clock buffer insertion and critical signal route have been improved to enhance whole system operation speed. The network-on-chip system is also proposed to enable fast signal transmission and correct control operation. The 3D image processing VLSI system can also be improved by suitable data storage and pipeline control flow. The chip simulation experiments show the accurate results with 247.728mW power consumption and 50MHz processing frequency. Practical chip test conclusion confirms that new RAM/ROM synthesis design can successfully realize innerchip write/read function and efficient data flow control to improve 3D reconfigurable system efficiency. Better image VLSI system can be realized by elaborate network-on-chip system and precise 3D stacking layer design1. Index Terms Three-dimensional (3D) VLSI, network-onchip system, reconfigurable memory system, high speed image processing, RAM/ROM synthesis design.

Fig. 1. 3D layer architecture for parallel image processing system.

I. INTRODUCTION Recently, image processing technology has been widely used in vision system, multimedia processor, and consumer electronics [1]. Rapid developing technology requires high performance image processor with fast computation speed, small chip size and low power consumption. In addition, flexible data flow, robust signal control and inner write/read operation are also important for image processing system. To improve image chip performance, three-dimensional (3D) technology has been used to realize effective image processing VLSI system [2], [3]. Typical 3D technology separates whole image chip to several function layers. Different layers are stacked vertically and are connected by

1 This work was supported in part by the ASET and NEDO in Japan. Yun Yang is with R&D Center of Excellence for Integrated Microsystems, Graduate School of Engineering, Tohoku University, Sendai, 980-8579 Japan (e-mail: yunyang@mems.mech.tohoku.ac.jp and yunyangfly@gmail.com).

Through-Silicon Via (TSV) between each layers [4]. In Fig. 1, the function layers include CMOS image sensor layer and analog-to-digital (A/D) converters layer, which is used to transfer analog image signal to input digital image data [5], [11], [12]. In addition, the following stacking layers, such as frame memory layer, reconfigurable memory layer, and Processing Element (PE) module layer, are used to deal with input digital data and realize fast speed image processing [6]. To improve system operation efficiency and avoid multi-layer pipeline delay, reconfigurable memory technology has been introduced to accelerate 3D image processing speed [7], [8], [13]-[16]. In addition, recent network-on-chip research has also been developed for 3D architecture construction and inter-layer data transmission [17]-[20]. Data synchronization can be improved by single instruction multiple data (SIMD) stream, and related pipeline operation stream of multiple instruction multiple data (MIMD) can also be used to enable image VLSI system performance [9], [10]. The global data control is important for parallel image processing system. To realize suitable data control function, several RAM modules are inserted into PE layer, as 3D image chip layer architecture in Fig. 1. Some useless ROM memory parts have been replaced by additional RAM modules. The special Flip-Flop design and clock buffer adjustment are also used to enable inner data write/read flow. Consequently, image data and control instruction can be inserted or be monitored by outside controller parts. The 3D image system can also be realized easily and control pipeline thread can be improved by direct data/instruction operation. The rest parts of this paper are organized as follows. Section II describes whole system configuration and layer architecture for 3D image processing VLSI system. In Section

Contributed Paper Manuscript received 07/14/11 Current version published 09/19/11 Electronic version published 09/19/11.

0098 3063/11/$20.00 2011 IEEE

1346

IEEE Transactions on Consumer Electronics, Vol. 57, No. 3, August 2011

III, new design methods for Flip-Flop and clock buffer are proposed to solve RAM/ROM co-design operation. Section IV and Section V also describe reconfigurable memory system and processor layer system design. Section VI proposes 3D network-on-chip system architecture, and Section VII also introduces self-repairable image system for dependable reconfigurable VLSI design. Section VIII presents 3D system simulation results and image chip experiments. Finally, we draw our conclusion and future work in Section IX. II. THREE-DIMENSIONAL LAYER ARCHITECTURE The three-dimensional (3D) architecture for parallel image processing system is shown in Fig. 1. Many different function layers are stacked vertically and the Through-Silicon Via (TSV) can be used to connect whole chip layers with specific stacking sequence. By effective function layer design and precise inter-layer connection, 3D architecture can reduce chip size, drop power consumption and accelerate system speed. In addition, image data transmission, system signal bandwidth, and analog-digital converter efficiency can also be significantly improved. As shown in Fig. 1, input image data can flow from top layer to down layer for reconfigurable system operation. The input image signal can be sampled by image sensor layer and be converted to digital image data by A/D converter layer. The frame memory layer, reconfigurable memory layer, and processing element layer are used to deal with digital image data. System reconfigurable operation requires careful thread pipeline and intricate state control. Frequent data write/read and direct instruction control can be considered as critical characteristic for 3D pipeline image system. Thus RAM and ROM combination system has been proposed to realize effective data control and high performance image processing.

to push and delay input clock signal. The serial Flip-Flops are also used to create synchronous signals under input clock control. The signal phase can be adjusted and synchronous output can also keep whole system signal in same operation sequence with input clock. By our proposed synchronous architecture, RAM/ROM synthesis design method can realize global synchronous control, and image processing system can be pipelined together to accelerate chip operation speed. B. Pipeline Latch System For 3D image processing system, pipeline thread mismatch can happen frequently and will cause system processing faults. To keep suitable 3D system pipeline process, synchronous system is recommended with precise instruction control. The proposed method of replacing common latch with pipeline Flip-Flop is described in Fig. 3. As waveform data illustration, input signals cannot always keep synchronous with clock signal. Then output signal cannot easily get synchronous data output and will cause system mismatch. Pipeline Flip-Flop (PFF) method is proposed and data switch module is used to control output signal under input signal combination. The related Karnaugh table is also described in Fig. 3 to show the detailed switch selection. In addition, new D-FF module is also used to replace common RS-FF latch to enable signal synchronization and 3D system pipeline process.

Fig. 3. Pipeline Flip-Flop architecture and latch system process method.

Fig. 2. Synchronous clock buffer and pipeline signal adjustment.

III. 3D RAM/ROM SYNTHESIS DESIGN SYSTEM A. Synchronous System Architecture The RAM and ROM modules are used together to realize better data write/read and inner-chip signal control in 3D image system. New 3D processing technology with Flip-Flop and clock buffer is proposed to generate input image signals as in Fig. 2. To enable system control and data pipeline, synchronous signal system is used in 3D RAM/ROM codesign system. As in Fig. 2, synchronous clock buffer is used

C. 3D RAM/ROM Reconfigurable Memory System The RAM/ROM whole system configuration for 3D image processing system is illustrated in Fig. 4. The input image data are stored in frame memory and inner-chip data memory. Through interconnection network between adjacent layers, image data can be sent to four Process Elements (PEs) for 3D pipeline system operation. Output image signal can be sent out by system output interface. To control inner-chip data, control unit and RISC processor are used to realize signal pipeline and data flow. The configuration memory is also used to insert the reconfiguration signal and enable the 3D reconfigurable image processing. The RAM/ROM synthesis design system can also write/read input image data to inner memory modules directly, and straight control instruction through 3D layers can also improve image chip performance.

Y. Yang: Three-dimensional Image Processing VLSI System with Network-on-chip System and Reconfigurable Memory Architecture

1347

Fig. 4. RAM/ROM reconfigurable modules for 3D image system.

D. Whole Chip Architecture for 3D Image System The VLSI chip architecture of 3D image processing systemis given in Fig. 5. The input data, address information and control signal can enter input switch in VLSI processor chip. Through control module and SRAM module, input image data can realize pipeline image processing. By output switch module, image data can be sent out to construct new output image picture. The control module in Fig. 5 consists of several modules, including frame memory, four PE modules, MAIN memory, INST memory for instant data process, and CONFIG memory for reconfigurable data process. The inner image data can realize pipeline operation by frame memory and PE modules. The image data are fetched from MAIN memory module. Neighboring INST memory and CONFIG memory are used together to control pipeline thread and reconfigurable sequence. The additional SRAM modules are applied to store image data and control instruction for direct outside system control into inner-chip modules. The RAM and ROM synthesis architecture in inner control module can realize system control and precise data pipeline by proposed chip architecture and memory modules. IV. RECONFIGURABLE MEMORY SYSTEM In proposed 3D image processing system, RAM and ROM are used together to realize image data write/read and inner signal control. Synchronous clock buffer and pipeline FlipFlop element are applied to realize image data operation for system instruction insertion and memory data fetch. To realize synchronous system control in 3D image processing system, new 3D reconfigurable memory system is proposed to enable image data reconfiguration and system selfrepairable operation. Fig. 6 illustrates typical 3D stacking architecture for sensor and reconfigurable memory. Common sensor network was used to grasp input image data, including static picture data and dynamic moving image data. The sensor image data will be transferred by A/D converter layer and interconnect network to next function layer as shown in Fig. 1.

Fig. 5. Whole chip architecture and RAM/ROM module design.

Next image processing layer is divided by several frame memory blocks as in Fig. 6. Different target image data will be assembled to get related reconfigurable memory blocks. The separated memory blocks will be different and be suitable for detailed input image data. If image data operation has some problems, such as image data loss and picture damage, neighboring memory block will be combined again to remove error image blocks and enable re-healing processing or selfrepairable image processing. The processing image data and reconfigurable instruction are controlled by processor element layer. Thus 3D reconfigurable memory system can realize precise image processing and raise whole system robustness.

Fig. 6. 3D vertical stacking layer for sensor and reconfigurable memory.

V. PROCESSOR CONTROL SYSTEM DESIGN The Processing Element (PE) layer in 3D image processor system can control image memory configuration and pipeline data flow, as shown in Fig. 7. To realize direct system control and reduce inter-layer transmission loss, processor modules and related memory blocks are mostly stacked in same vertical column. Input image data from outside sensor layer can be converted by following A/D converter layer. The analog image signal can be transferred to pipelined digital image data in following column memory block. The image data can flow from top layer to down layer vertically, and system control instruction works from down layer to top layer on the contrary.

1348

IEEE Transactions on Consumer Electronics, Vol. 57, No. 3, August 2011

3D reconfigurable RAM/ROM memory layer can be controlled easily and input image data can be operated with fast pipeline thread and flexible inner instruction. Many related reconfigurable processor system are used in recent VLSI processing system. Similarly, the processor layer has many processing elements and reconfigurable system is also applied to realize reconfigurable image operation. As in Fig. 7, the vertical data flow can be controlled by adjacent processor layer and frame memory layer. Processing Elements (PEs) layer enables related memory combination and data block partition. Depended on image operation requirements, frame memory layer can be divided to several blocks to store pipeline image data. ROM and RAM modules can also be combined to constitute whole memory block and realize image self-repairable operation. The image data accuracy and system processing speed will be improved by precise processor control and 3D reconfigurable architecture.

assemble stacking layer connection. The three-layer assemble connection for A/D converter, power network, and reconfigurable memory layer in Fig. 8 shows typical multilayer TSV architecture. Also, four-layer TSV network from power network layer to processing element layer can describe further complex multi-layer structure in seven-layer image processing system as in Fig. 8.

Fig. 8. 3D network-on-chip image system with TSV layer architecture.

VII. SELF-REPAIRABLE VLSI AND DEPENDABLE


RECONFIGURABLE SYSTEM Fig. 7. 3D processor control layer and reconfigurable memory system.

VI. NETWORK-ON-CHIP DESIGN For advanced VLSI system research, system processing speed, whole chip area and power consumption become the critical design challenges. Recently, the 3D stacking layer architecture and network-on-chip design are also considered to improve system efficiency. In addition, further research focuses on system combination design for 3D architecture and network-on-chip system. Thus new 3D interconnect network architecture has been proposed in this paper to improve layer stacking flexibility and whole system performance. In practical 3D image processing chip, we improve the 3D network-on-chip design based on layer architecture in Fig. 8. For complex stacking layer system, many different function layers are connected with specific Through-Silicon Via (TSV) architecture. Many stacking layer types are used with related TSV connection structures, including inter-layer TSV, translayer TSV, and multi-layer TSV. As in Fig. 8, neighboring layer connection means inter-layer TSV network, which is designed to connect adjacent layers by specific silicon via and interconnect network. Another TSV type is trans-layer TSV, which passes through neighboring function layer and connect corresponding layers by trans-connection silicon via, such as reconfigurable memory layer and processing element layer in Fig. 8. In addition, further TSV tunnel design can also connect several layers together and realize multi-layer TSV type with

Dependable reconfigurable VLSI system is recent research hotspot for high performance processor system. In practical image VLSI chip, data operation errors can happen frequently and will cause serious problems to influence whole system performance. To solve image data mismatch and processing error problem, self-repairable methods and re-healing design technologies are applied in our practical chip design.

Fig. 9. 3D image processing with reconfigurable self-repairable system.

Common robust design method to repair VLSI system error is reconfigurable re-healing technology. As in Fig. 9, the processing image data are damaged in center part of whole image blocks. Reconfigurable self-repair method checks the vertical image blocks to get the detailed error address. The horizontal image blocks are also identified by memory data

Y. Yang: Three-dimensional Image Processing VLSI System with Network-on-chip System and Reconfigurable Memory Architecture

1349

scanning to get required image data, which are used to repair error image blocks with suitable re-healing methods. After damaged image data and related address blocks are decided by image block sweeping, specific error image blocks will be reconfigured and neighboring memory blocks are used to replace error image blocks and repair damaged image data by system design target and related image information. When image error data are corrected in corresponding image memory blocks, whole VLSI system will enter reconfigurable operation again to recover original image block architecture. The repairable image blocks are assembled together and are used to construct new center image block again. The border separation is removed and four corrected image sub-blocks are composited to realize image re-healing operation. The critical points for VLSI self-repairable design are reconfigurable block area and repairable control sequence. The VLSI re-healing performance is determined by design requirements and system robustness. If large memory area can be used for image repairing, system correct efficiency will be increased and block searching time will be extended with high power consumption and large chip area. In addition, if new dependable design technologies are also used together, such as compact reconfigurable border and small image sub-block, operation power and chip area can be reduced greatly. However, related reconfigurable processing cannot always realize successful repairable results, and system robustness will be reduced rapidly. Thus in common repairable method, selected image border varies from three pixels to five pixels around the detected error image block.

detected damage image blocks and construct neighboring memory blocks. Third sequence is re-healing processing and new image block reconfiguration by related memory interconnection and image repairable method. Finally, memory blocks and processing elements will be combined again to recover original image VLSI system. The re-healing sub-image blocks are assembled together and improved image results are created by repairable processing VLSI and reconfigurable image system. VIII. CHIP SIMULATION AND EXPERIMENTAL RESULTS A. Image Processor Chip Design and Simulation Based on synchronous improvements for clock buffer and Flip-Flop latch in 3D stacking layers, we designed new image processing chip by 0.13 um technology. Fig. 10 shows the layout micrograph for practical manufactured VLSI chip. The detailed chip has 208 pins with 5000 um length and 5000 um width. The gate number is about 980,000 gates and chip utilization is 20.655%. Chip clock cycle is 50 MHz, and practical processing frequency is 25 MHz. The interconnect distribution parts use 8 metal layers, including power mesh network, clock tree and other function signal wires.

Fig. 11. violation.

VDD drop results for IR-drop verification without EM

Fig. 10. Chip layout for 3D reconfigurable image processing system. Fig. 12. VSS rise results for IR-drop verification without EM violation.

The specific repairable sequence also determinates whole system operation and processing efficiency for the 3D image reconfigurable VLSI system. Considering the detailed control sequence, first system operation is scanning image range and searching related address for repairable image blocks. Second processing method is reconfigurable operation to separate the

For IR-drop verification with zero EM violation, practical switching rate is 20% under 50 MHz clock control. The power consumption is 247.728 mW under 1.2V power source. For the VDD-drop simulation as in Fig. 11, worst drop value is 13.802 mV with 1.15% drop rate. Similarly, worst rise value

1350

IEEE Transactions on Consumer Electronics, Vol. 57, No. 3, August 2011

is 9.919 mV and related rise rate is 0.827% for VSS-rise simulation in Fig. 12. Based on experimental simulation for VDD-drop and VSS-rise, image processor chip can realize suitable image operation without excessive disturbance for signal floating and drop/rise variance. 3D layer stacking in practical processor chip can also be realized easily and assembled successfully with precise image data adaptation and fluent inter-layer signal transmission. B. Test Board Experiments and Simulation We also designed test board to get practical experimental results after 3D image processing chip was manufactured. Photograph of implemented test board is shown in Fig. 13. Image test board system consists of base board, socket part, input/output ports, interface part, and image processing chip. Practical base board is designed by 4 layer experimental board with 180 mm length and 180 mm width. The socket part is embedded in center range of base board and practical tested chip is inserted in the socket with tight pin contact. Around inserted socket, we designed four column input/output ports, which are used to write/read image data, memory address and control instruction signals. By outside computer interface port, we can also control practical test board and access board signals for 3D image processing system simulation.

input data read and output data write. Main memory data are used to store main image processing data, and INST memory data mean the adjustment instruction for image system reconfiguration. From experimental waveform results, outside data and control instruction can be inserted into inner RAM modules and can be fetched by outside system. Thus we can realize better data control and faster pipeline operation to increase whole 3D image chip performance.

Fig. 14. SRAM data output with 1 bit step increment.

Fig. 13. Practical chip test board for 3D image processing system.

Practical test results for 3D image processing chip by computer monitor system are given in Fig. 14 and Fig. 15. We control inner image data bus with 1 bit step increment input by inserted SRAM modules as in Fig. 5. The practical results of SRAM output are also upgraded step by step with 1 bit data change. Ladder increment results can realize suitable write and read procedure from outside part to inside chip directly, as in Fig. 14. Image system control can be increased greatly and global pipelined operation can be realized easily with immediate outside instruction control. Inner memory in processor module, such as Main memory and instant memory (INST memory) in Fig. 5, can also access outside data signals directly. In addition, Fig. 15 shows processing waveform for

Fig. 15. Image processing memory output data for write and read.

C. Image Simulation and Conversion Results Image simulation results by 3D reconfigurable image chip are given in Fig. 16. Many image data are tested and can be used for conversion simulation by practical image chip and computer simulation program. As the image conversion experiments in Fig. 16, we tested typical image figure named as Cameraman, which describes the particular man using Camera machine to take photograph around his environment. Based on specific picture conversion and related intern image processing, Cameraman figure can be compressed rapidly.

Y. Yang: Three-dimensional Image Processing VLSI System with Network-on-chip System and Reconfigurable Memory Architecture

1351

The picture can be used for next image operation to enhance picture display precision and system processing performance. To realize data transformation and image compression, we use specific image MPEG algorithms to extract figure edge and get the corresponding thresholding figure for final image processing. Also, Fig. 16(b) and Fig. 16(c) illustrate detailed image conversion and data transmission, respectively. The practical image data can be operated quickly and realize fast image processing for super high speed Camera design. 3D network-on-chip architecture ensures fast system speed and improves data transmission efficiency. Other MPEG/JPEG image processing methods, such as DCT/IDCT algorithm, pipeline image operation, multi-layer stacking method, and reconfigurable self-repairable memory, can also be applied for new 3D image processing system.

removed and new image blocks are assembled again with specific sequence to recover previous image part. Similar operation is realized continuously for another error block in Fig. 17(d). Following step with reconfigurable memory block and image re-healing operation are also used to repair image data in Fig. 17(e). Finally, whole test picture can be recovered to correct image data results as shown in Fig. 17(f). If there are numerous image errors and large picture area, reconfigurable self-repairable operation in 3D image VLSI chip is also progressed step by step with similar sequence as Fig. 17. The self-repairable technology can enhance whole image system performance. It can also heal error picture parts after 3D image system processing and inter-layer picture data transmission. Image size and error number can influence the data recover efficiency and output image quality. Thus the synthesis image operation, including picture data processing and reconfigurable repairable system, is our main design contribution in practical 3D image VLSI system.

Fig. 17. Image self-repairable processing for reconfigurable VLSI system. Fig. 16. Image conversion results for 3D image processing chip.

D. Reconfigurable Image Self-repairable Processing In practical image chip test, image precision problem and picture distortion happen frequently. Common image errors are generated in signal processing and data transformation for 3D image processing system. Robust image self-repairable technology is necessary for high performance image chip design. Fig. 17 shows typical image repairable methods in our 3D image processing system. Six pictures from Fig. 17(a) to Fig. 17(f) are used to explain detailed processing sequences for picture data recovery and image re-healing results. First test picture with plane image in Fig. 17(a) is original experimental picture. In Fig. 17(b), two square blocks mean image errors in test picture. Data scanning is necessary to capture the detailed places in practical picture range. Next reconfigurable technology is used to replace error image blocks with neighboring image parts as in Fig. 17(c). By related image block repairing, error image blocks can be

E. Memory Allocation and Chip Size Practical image operation and whole VLSI system are realized by related RAM/ROM memory blocks in our 3D reconfigurable image chip. The chip size is also decided by memory allocation area and RAM/ROM block number. As in Table I, memory information and allocation sequence are shown in detail for our image processing chip. Based on 3D chip architecture as in Fig. 5, frame memory blocks (FMem) are used to store image frame data. Total Fmem module has 8 KB data capacity, including 16 number and 2568bit unit size for each frame memory. Practical area for each FMem block is 46600 um2. Whole FMem block area is allocated with 745600 um2 as shown in Table I. Similarly, data memory (DMem) and I/O memory (IOMem) have same size and area allocation in our 3D image chip. The important configuration memory (CMem) has 409640bit block size and 587600 um2 chip area. The related processing memory, such as main memory (MMem), pipeline instruction memory (PMem), and table memory (TMem) in Table I also use

1352

IEEE Transactions on Consumer Electronics, Vol. 57, No. 3, August 2011

SRAM modules to deal with image data directly in whole VLSI system. The pipeline instruction is handled in PMem module, and TMem block gives system Table memory to store image middle procedure data for next reconfigurable image processing. Furthermore, additional SRAM module in Fig. 5 uses large 819232bit size and two 32-bit fast memory blocks. The inserted SRAM blocks occupy 212800 um2, which are used to realize the direct image data and instruction fetch operation into our 3D image processing system. In summary, total memory blocks have nine type modules with 106 KB size and 70 memory number. Whole memory area is about 4250600 um2 by size accumulation and system allocation. If chip peripheral ring area is also considered in practical 3D image system, additional allocated memory area is about 1000000 um2, which commonly occupies about 25% area in entire memory system. Thus whole image chip area is more than 5000000 um2 with detailed memory size allocation. If addition core processing elements are also allocated, whole chip area can be increased further and global synthesis design between memory and processor will become future research challenges. Robust image processing, such as self-repairable operation and re-healing method, will also consider memory allocation and module area in whole 3D image chip.
TABLE I MEMORY SIZE AND CHIP AREA ALLOCATION
Memor y FMem 1 FMem 2 DMem CMem MMem PMem IOMem TMem SRAM Total Size 2568bit 2568bit 2568bit 409640bit 409640bit 25640bit 2568bit 25624bit 819232bit Nine Types Nu m 16 16 16 1 1 1 16 1 2 70 Total 8KB 8KB 8KB 20KB 20KB 1.25KB 8KB 0.75KB 32KB 106KB Area 46600 46600 46600 587600 133000 75400 Whole 745600 745600 745600 587600 133000 75400 Detail Frame Frame Data Config. Main Pipe.Inst . IO Buf Table SRAM Total

accelerated rapidly with very fast parallel and direct data control. 3D image system also has self-repairable feature and re-healing merit to realize dependable VLSI reconfigurable system and high robustness image operation as in Table II.
TABLE II COMPARISON FOR 2D AND 3D IMAGE PROCESSING SYSTEM
Memory 2D image processing system 3D ROM image processing system 3D RAM/ROM reconfigurable image system Size large very small small Speed slow fast very fast Data operation sequence thread and parallel operation fast parallel operation very fast parallel operation and direct data control Robustness low common high robust system and precise selfrepairable design

46600 745600 46600 46600 212800 425600 4250600 um2

F. Discussion and Future Challenge Direct memory insertion in 3D image processing system can improve system robustness and control inner data operation. As in Table II, common 2D planar image system has large chip size and slow processing speed. The image data and operation instruction can be handled by common sequence, and its system robustness is not enough without suitable selfrepairable capability and re-healing feature. Compared with common 2D image system, our proposed 3D architecture can reduce chip size and increase processing speed. 3D operation system can also realize fast parallel processing and robust image operation for image consumer electronics products. In this paper, we propose new 3D RAM/ROM image system with reconfigurable operation memory and 3D network-onchip architecture. The design chip can insert image data or control instruction into inner-chip modules directly. The operation image data can also be sent out immediately and inner control instruction can be monitored for reconfigurable system processing. Whole 3D VLSI system speed can be

One demerit point for 3D network-on-chip architecture is complex critical data path from input ports to output ports. The extended signal routes will cause data transmission loss and can influence whole VLSI system performance. By additional clock buffer insertion and pipeline Flip-Flop latch replacement, inner signal delay can be created and will waste inner processing time. Whole pipeline system frequency will also be reduced and image data processing speed will be decreased because of critical data path delay. In addition, complex network connection can also influence system data transmission and introduce inter-chip data mismatching. Another weak point in our 3D chip system is layer stacking efficiency. Different image function layers have respective layer features and connection methods. The stacking sequence and neighboring layer relation also require precise design and system consideration. More stacking layers can realize small chip size, fast operation speed, and compact image operation. Global system synthesis with different stacking layers and related function combination is recent design hotspot. In addition, low power system and high robust chip will also become important research targets in the future. Thus our future design challenge in next 3D image system will focus on several significant targets, such as reduce data critical path, decrease inter-layer connection complexity, and accelerate image processing speed. The pipeline thread of image inner element will be studied and the delay path will be divided for inner pipeline process. The multi-layer stacking technology is also future research topic with detailed layer combination and specific TSV tunnel design. Consequently, precise design adjustments in processing element modules and reconfigurable data sequences will be improved to satisfy new data path flow and highly pipeline image operation in our future complex 3D system research. IX. CONCLUSION In this paper, new reconfigurable system with RAM/ROM memory modules and 3D layer architecture is proposed for highly pipeline image processing chip. Flexible data flow and direct system control can be realized by precise data fetch in RAM and ROM memory. The synchronous clock buffer and

Y. Yang: Three-dimensional Image Processing VLSI System with Network-on-chip System and Reconfigurable Memory Architecture

1353

pipeline Flip-Flop module are used to adjust 3D system processing flow. New 3D stacking layer architecture can also be applied to reduce image chip size and increase system pipeline speed. Additional 3D network-on-chip connection system can satisfy 3D chip stacking requirements and enable global pipeline operation for multi-layer VLSI image system. Experimental results in this paper illustrate that new 3D reconfigurable memory system can deal with inner data and control instruction signals directly for dependable VLSI chip. Further image robust methods, including self-repairable operation and re-healing system, are also used in proposed 3D image processing system. Future challenges will be focused on critical path reduction, fast pipeline thread construction, complex multi-layer stacking methods, and highly robust selfdependable VLSI system research. ACKNOWLEDGMENT The author thanks research cooperation by Mr. S. Kodama, Mr. C. Naito, Mr. H. Ueda, and Dr. K. Kiyoyama. The author also thanks the supervisor by Prof. T. Tanaka, Prof. M. Koyanagi, and Prof. M. Esashi in Tohoku University, Japan.

REFERENCES
[1] [2] D. Doswald, J. Hafliger, P. Blessing, N. Felber, P. Niederer, and W. Fichtner, A 30-frames/s megapixel real-time CMOS image processor, IEEE J. Solid-State Circuits, vol. 35, no. 11, pp. 1732-1743, Nov. 2000. M. Koyanagi, Y. Nakagawa, K.-W. Lee, T. Nakamura, Y. Yamada, K. Inamura, K. Ki-Tae Park, and H. Kurino, Neuromorphic vision chip fabricated using three-dimensional integration technology, in Proc. ISSCC Dig. Tech. Papers, Feb. 2001, pp. 270271, 454. J. W. Joyner, P. Zarkesh-Ha, and J. D. Meindl, Global interconnect design in a three-dimensional system-on-a-chip, IEEE Trans. VLSI Systems, vol. 12, no. 4, pp. 367372, Apr. 2004. M. Koyanagi, T. Fukushima, and T. Tanaka, High-density through silicon vias for 3-D LSIs, Proceedings of the IEEE, vol. 97, no. 1, pp. 4959, Jan. 2009. K. Kiyoyama, Y. Ohara, K.-W. Lee, Y. Yang, T. Fukushima, T. Tanaka, and M. Koyanagi, A parallel ADC for high-speed CMOS image processing system with 3D structure, in Proc. IEEE Int. Conf. 3D System Integration, Sep. 2009, pp. 14. T. Sugimura, Y. Konishi, J. Deguchi, T. Ishihara, T. Fukushima, A. Konno, M. Uchiyama, and M. Koyanagi, Design of parallel reconfigurable image processor with three-dimensional structure, IEICE Trans. Inf. Syst., vol. J89-D, no. 6, pp. 11411152, Jun. 2006. D. Amano, T. Sugimura, Y. Konishi, T. Fukushima, T. Tanaka, and M. Koyanagi, Reconfigurable stacked memory system for parallel image processing using three-dimensional LSI technology, in Proc. IPSJSLDM, Oct. 2006, pp. 147152. D. Lattard, E. Beigne, F. Clermidy, Y. Durand, R. Lemaire, P. Vivet, and F. Berens, A reconfigurable baseband platform based on an asynchronous network-on-chip, IEEE J. Solid-State Circuits, vol. 43, no. 1, pp. 223235, Jan. 2008. T. Komuro, S. Kagami, and M. Ishikawa, A dynamically reconfigurable SIMD processor for a vision chip, IEEE J. Solid-State Circuits, vol. 39, no. 1, pp. 265268, Jan. 2004.

[3] [4] [5]

[10] S. Kodama, D. Amano, T. Sugimura, T. Fukushima, T. Tanaka, and M. Koyanagi, New reconfigurable memory architecture for parallel imageprocessing LSI with three-dimensional structure, Japanese J. Applied Physics, vol. 47, no. 4, pp. 27742778, Apr. 2008. [11] D. Kim, Z. Fu, J. H. Park, and E. Culurciello, A 1-mW CMOS temporal-difference AER sensor for wireless sensor networks, IEEE Trans. Elec. Devices, vol. 56, no. 11, pp. 25862593, Nov. 2009. [12] J. Guo and S. Sonkusale, A high dynamic range CMOS image sensor for scientific imaging applications, IEEE J. Sensors, vol. 9, no. 10, pp. 12091218, Oct. 2009. [13] H. Singh, M. Lee, G. Lu, F. Kurdahi, N. Bagherzadeh, and E. Filho, MorphoSys: an integrated reconfigurable system for data-parallel and computation-intensive applications, IEEE Trans. Computers, vol. 49, no. 5, pp. 465481, Nov. 2009. [14] H. Kondo, M. Nakajima, N. Masui, S. Otani, N. Okumura, Y. Takata, T. Nasu, H. Takata, T. Higuchi, M. Sakugawa, H. Fujiwara, K. Ishida, K. Ishimi, S. Kaneko, T. Itoh, M. Sato, O. Yamamoto, and K. Arimot, Design and implementation of a configurable heterogeneous multicore SoC With nine CPUs and two matrix processors, IEEE J. Solid-State Circuits, vol. 43, no. 4, pp. 892901, Jan. 2008. [15] H. Kanbara, R. Kinjo, Y. Toda, H. Okuhata, and M. Ise, Dependable embedded processor core for higher reliability, in Proc. IEEE Int. Symp. Consumer Electronics, May 2009, pp. 819822. [16] O. J. Kuiken, X. Zhang, and H. G. Kerkhoff, Built-in self-diagnostics for a NoC-based reconfigurable IC for dependable beamforming applications, in Proc. IEEE Int. Symp. Defect and Fault Tolerance of VLSI Systems, Oct. 2008, pp. 4553. [17] I. Loi, S. Mitra, T. H. Lee, S. Fujita, and L. Benini, A low-overhead fault tolerance scheme for TSV-based 3D network-on-chip links, in Proc. IEEE/ACM Int. Conf. CAD, Nov. 2008, pp. 598602. [18] F. Li, C. Nicopoulos, T. Richardson, Y. Xie, V. Narayanan, and M. Kandemir, Design and management of 3D chip multiprocessors using network-in-memory, in Proc. Int. Symp. Computer Architecture, Jun. 2006, pp. 130141. [19] B. Feero and P. P. Pande, Performance evaluation for three-dimensional networks-on-chip, in Proc. IEEE Computer Society Annual Symp. VLSI, Mar. 2007, pp. 911. [20] Y. Xu, Y. Du, B. Zhao, X. Zhou, Y. Zhang, and J. Yang, A low-radix and low-diameter 3D interconnection network design, in Proc. IEEE Int. Symp. High Performance Computer Architecture, Feb. 2009, pp. 3042.

BIOGRAPHY
Yun Yang (M08) was born in Minhou county, Fujian province, China in 1976. He received B.S. degree in electronic engineering and M.S. degree in microelectronic engineering from Fudan University, Authors Shanghai, China, in 1998 and 2004, respectively, and Photo Dr. of Eng. degree in information, production and systems (IPS) from Waseda University, Kitakyushu, Japan, in 2008. From 1998 to 2000, he worked as software engineering for Chuwa Software for Fujitsu Co., Ltd. in MPEG audio system design. He also worked as Research Associate in Information, Production and Systems Research Center (IPSRC), Waseda University, Kitakyushu, Japan in 2008. Then he works as Postdoctoral Researcher in Tohoku University, Sendai, Japan. His research interests include 3D VLSI design, EDA physical design, reconfigurable SoC system, network-on-chip research, image processing system, and computer pipeline architecture. He received the ``Excellent Student Award of IEEE Fukuoka Section'' in 2005. Dr. Yang became a Member (M) of IEEE in 2008. He is also a member of IEICE, ACM and AAAS.

INSERT

[6]

[7]

[8]

[9]

Das könnte Ihnen auch gefallen