Sie sind auf Seite 1von 4

Hardware-oriented Partition for Embedded Multiprocessor FPGA Systems

Trong-Yen Lee1, Yang-Hsin Fan1,2, Yu-Min Cheng1, Chia-Chun Tsai3 and Rong-Shue Hsiao1 1 Dept. of Electronic Engineering and Institute of Computer and Communication, National Taipei University of Technology, Taipei, Taiwan, R.O.C. 2 Information System Section of Library, National Taitung University, Taitung, Taiwan, R.O.C. 3 Dept. of Computer Science and Information Engineering, Nanhua University, Chia-Yi, Taiwan, R.O.C. {tylee, ymcheng, cct, rsh}, Abstract
In this paper, we present hardware-oriented partitioning approach that can solve the partitioning issues for embedded multiprocessor FPGA systems. In addition, it can gain a better partitioning result, faster execution time, less memory and higher slice used rate, under satisfied system constraints. We also demonstrate the feasibility of our approach by a JPEG encoding system using Xilinx ML310 FPGA platform. Experiment results show that the execution time and memory size are improved. We propose hardware-oriented partitioning method which starts with all functionalities in software and then moves portions into hardware implementation to obtain valuable result. In case of that partitioning result could not satisfy system constraints, and then the refinement phase will force the designer to go back to partition phase until partitioning result meets system constraints. It is known that each constraint is hard to achieve successfully in embedded multiprocessor FPGA systems. How is it possible to meet all constraints by hardware-oriented method? Therefore, we present a hardware-oriented partitioning method which profits the advantages of faster execution time and less memory size in embedded multiprocessor FPGA systems.

1. Introduction
Field programmable gate array (FPGA) is a popular prototype platform while designer develops consumer electronics products. Nowadays, inside the FPGA is multiprocessor system-on-a-chip (MPSOC) that enhances FPGA to be high-performance and high functionalities. But, the FPGA system become more and more complex due to that hardware and software must co-work coordinately. In traditional FPGA system design process, hardware-software partitioning usually depends on engineers experience. However, system integration is a major challenge in this way because high level synthesis and compilation in hardware and software are developed separately. Also, the partitioning result may not be the best solution for execution time and memory size. As a result, the system performance and functionality may be limited or insufficient in co-verification due to that the system hardware and software are developed independently after partitioning phase.

2. Preliminaries
Applying hardware-software codesign [1][2] in FPGA become a trend because of MPSOC era is coming. Hardware-software partitioning [3] is a significant issue in codesign due to that system constraints must meet simultaneously system specification in MPSOC system. With partitioning approach [4], it is classified to software-oriented (SWoriented) and hardware-oriented (HW-oriented) partitioning. SW-oriented partitioning method starts with all functionalities in software and then moves portions into hardware if it can gain a better partition result. In contrast, HW-oriented method starts with all functionalities in hardware and then moves portions into software implementation to obtain valuable result. But, how to guarantee gaining best partitioning result in case of moving portion without any strategies?

0-7695-2882-1/07 $25.00 2007 IEEE

2.1 System model

A control and data flow graph (CDFG) is an acyclic graph which composes of nodes and edges that is often used in high level synthesis. It can easily model data flow, control steps and concurrent operations because of its graphical nature. Thus, we use a node of CDFG to represent a system hardware or software component for hardward-oriented partitioning (HOP) approach. Figure 1 shows a CDFG example which consists of control flow graph (CFG) and data flow graph (DFG). Also, node a, b, and l represent a set of function element (FE) where a(S,5) represents that the function element a implements by software (S) and the execution time is 5. Another node, b(H,1.5), stands for that the function element b implements by hardware and the execution time is 1.5. In this paper, we use CDFG to construct system model for the input model of hardware-software partition.

processor environment, node f, g and h in Figure 1 can not be assigned as software simultaneously for a processor because one job can be only executed in single processor at a time. Similarly, node f, g, and h can not be assigned over two nodes as software in two processors environments. But if node f, g, and h be assigned as software at the same time, then concurrency will be happen.

3. Hardware-oriented partition
The number of slice usage in a FPGA depends completely on developing FPGA architecture. In developing embedded multiprocessor FPGA systems, the whole of slices within a FPGA is regarded as a hardware cost by designer. If a design only uses 5% slices, the cost of hardware still refers to all slices when a FPGA is selected. As a result, up to 95% slices are wasted in this case. In contrast, if a partitioning result could re-assign the FE of software to hardware as much as possible, then it is very helpful for improving execution time and memory size due to that the final design use more hardware. From hardware cost of view, the more slices are used, the fewer cost are required. Therefore, we propose a hardwareoriented partition (HOP) algorithm which are shown in Table 1. Our approach can increase the slices utilization for getting more efficiently hardwaresoftware partitioning result with minimum execution time and memory size. In addition, the concurrency of embedded multiprocessor is taken into account in HOP. Table 1. HOP partitioning algorithm
HOP Algorithm Input: CDFG function elements (FEi), System constraints, Processor numbers Output: Mapping all function elements into hardware components and software components 1 Start all FEs as software unfixed_sw 2 for each level li in CDFG do 3 if (concurrency) { 4 Sort all FEi by texe descend 5 for each FEi(unfixed_sw) in li do 6 FEi(unfixed_hw)=FEi(unfixed_sw(Max(texe))) 7 if ((Ptotal>Pconstraint) or (Stotal>Sconstraint)) { 8 FEi(unfixed_sw)=FEi(unfixed_hw) } } 9 Sort all FEj(unfixed_sw) by texe descend in CDFG 10 for each FEj(unfixed_sw) in CDFG 11 FEj(unfixed_hw)=FEj(unfixed_sw(MAX(texe))) 12 if((Ptotal<Pconstraint) and (Stotal<Sconstraint) and (t<t)) { 13 FEj(fixed_hw)= FEj(unfixed_hw) } 14 else { 15 FEj(fixed_sw)= FEj(unfixed_hw)}

Control step 1




Control step 2 Control step 3 Control step 4








System Execution Time = 27

Figure 1. Partitioning result for a CDFG case

2.2 System constraints

The limitations of hardware-software partition are execution time, cost, power consumption and the number of processors in a system design. Execution time represents the longest execution time of the path in CDFG such as a to l in Figure 1. The system cost constraint consists of hardware and software cost which is corresponding to the usage of FPGA slices and memory size, respectively. The constraint of power consumption denotes the limitation of total system power dissipation after hardware-software partition. The last constraint discusses about the number of processor inside a system. Under single

3.1 Concurrency
Concurrence also needs to be solved by re-assign process. As a result, re-assign process is a significant issue in maximizing the utilization of FPGA slices and concurrence. But, what is the best strategy of re-assign process in HOP? Our strategy is summarized: Firstly, we define the execution time (ET) as the maximum execution time of FE in each CDFG level. All FEs are set as software. Secondly, we search all FEs in each level then record the concurrence FEs. Thirdly, we reassign software of FE implementation by hardware to improve the execution time and the slice utilization. Fourthly, we check the new partitioning result whether meet system constraints and without concurrency conflict. Above procedures will be iterated until without concurrent. Finally, the same path of FEs will be allocated for a processor to execution, then the communication time will be reduced.

Control step 1 Control step 2 Control step 3 Control step 4












System Execution Time = 25

Figure 2. Partitioning result by HOP

3.2 Advantages
After solving concurrency, we aim at improving execution time by hardware-orient partition. First, we descend the execution time of FEs. Second, we reassign software of FE by hardware then observe the partitioning result whether the execution time and system constraint is improved. If not, we restore FE to re-assign process. Finally, we iterate step 2 until the rest of FEs of software have re-assigned. Table 2 shows function element specification that consists of hardware and software of execution time, cost and power consumption. Figure 1 illustrates a partitioning result which does not apply HOP and the system execution time as 27. If it re-assign FEe, FEh and FEk to hardware and FEc, FEf and FEi to software, it gains a better result where shown in Figure 2. Table 2. CDFG Function element specification
FE a b c d e f g h i j k l Exec. Time HW SW 2.5 5 1.5 3 2 4 2.25 5 2.75 6 1.75 4 2.5 5 3.25 6 2.25 5 2.5 6 1.25 3 2 4 Cost HW 5.5 4.5 5 5.25 5.75 5 5.5 6.5 5.5 5.5 4.25 4 SW 2.5 2 3 3.5 3.75 2.5 1.5 3.75 3.5 3.25 1.25 2.5 Power HW SW 6.5 0.5 5.5 1 6 0.75 6.25 0.5 6.75 0.25 6 0.75 6.5 0.5 7.5 0.25 6.5 0.5 6.5 0.25 5.25 1 5 0.75

4. Experiment results
We use a example, JPEG encoding system, to demonstrate our proposed partitioning method on embedded multiprocessor FPGA system. Figure 3 shows the CDFG of JPEG encoding system. We design two useful applications which refer to SW Only block in Figure 3 that is used for reading BMP file and transferring BMP to YUV file format. From FEa to FEv, all FEs are developed individually as hardware by verilog and software by C language. In addition, each hardware and software of FEs is measured for execution time, memory, slice usage and power consumption so as to analyze various partitioning result by GA and HOP. The environment of experiment composes of personal computer (PC), Xilinx FPGA ML310 and Xilinx ISE 7.1i and EDK 7.1i. The PC is running on the Pentium IV 2.4GHz and 256MB RAM. FPGA ML310 platform consists of 13696 slices, 2448x103 byte memory size and two embedded microprocessors. The first example aims at finding power consumption which should be less than 600x10-3 watt. Next, we run a series of experiments for power consumption from 650x10-3 watt to 900x10-3 watt to analyze various system constraints such as execution time, memory size, slice usage and power consumption. Table 3 shows function element of partitioning FEb, , FEv}, as result, FEs={FEa, FEi={0010010101110110111010} and FEj={0100111110101110101110} where 0 and 1 represents software and hardware by GA and HOP,

respectively. With comparison of the number of hardware, HOP is more than GA result that HOP is also better than GA in execution time, memory and slice usage. Despite power consumption of HOP is higher than GA, it is lower than system constraint. Table 4 displays a series of examples that power consumption begins 650x10-3 watt to 900x10-3 watt. All the execution time is near to 20008.93x10-6 second.
SW Only Read .Bmp File YUV Control Step 1

Table 4. Results for JPEG encoding system by HOP

Examples (10-3 watt) 650 700 750 800 850 900 Exec. (10-6sec) 20008.93 20008.93 20008.93 20008.93 20008.93 20008.93 Mem. (103 byte) 14.082 14.082 14.082 14.082 14.082 7.041 Slice Yes Yes Yes Yes Yes Yes Pow. Yes Yes Yes Yes Yes Yes

Level Offset(FEa)

Control Step 2 Control Step 3 Control Step 4 Control Step 5 Control Step 6 RLE(FEs)

Remark: Eexe.: Exection time Mem.: Memory size Slice: Satisfied slice usage Pow.: Satisfied power consumption







From memory size of view, it obviously decreases at the 900x10-3 watt due to that the function element of partitioning result increases hardware in turn decreases software. Another two constraints, slice usage and power consumption are satisfied system constraint.

DPCM(FEj) ZigZag(FEk) DPCM(FEh) ZigZag(FEi)

5. Conclusion
Hardware-oriented partitioning method can be used when develops an embedded multiprocessor FPGA system. Experiment results illustrate that our HOP approach can gain a better partitioning result with reduced of execution time, memory size and higher slice used rate. Moreover, concurrency conflict is also taken into account in HOP result in communication time can be reduced by allocating process.

DPCM(FEl) ZigZag(FEm)



Figure 3. CDFG for JPEG encoding system Table 3. Partitioning results for JPEG encoding system by GA and HOP
Results Constraints Number of S/W FEs Number of H/W Execution time Memory Slice used rate Power consumption 12 20111.26us 146.509kb 47.1% 499.121mw 14 20066.64us 129.68kb 56.58% 599.67mw GA [5] 10 HOP (Proposed) 8

[1] W. Wolf, A Decade of Hardware/Software Codesign, IEEE Computer, Vol. 36, pp. 38-43, 2003. [2] N. S. Woo, A. E. Dunlop, and W. Wolf, Codesign from Cospecification, IEEE Computer, Vol. 27, pp. 42-47, Jan. 1994. [3] T. Y. Lee, P. A. Hsiung, and S. J. Chen, Hardwaresoftware Multi-Level Partitioning for Distributed Embedded Multiprocessor Systems, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, pp. 614-626, 2001. [4] R. Ernst, J. Henkel and T. Benner, Hardware-software Cosynthesis for Microcontrollers, IEEE Design & Test of Computer, Vol. 10, pp. 64-75, Dec. 1993. [5] Y. Zou, Z. Zhuang and H. Cheng, HW-SW Partitioning based on Genetic Algorithm, Proceedings of Congress on Evolutionary Computation (CEC2004), Vol. 1, pp. 628-633, Jun. 19-23, 2004.