Sie sind auf Seite 1von 4

CHIPCFLOW - A DYNAMIC DATAFLOW MACHINE USING DYNAMIC RECONFIGURABLE HARDWARE Jorge L. Silva, Joelmir J.

Lopes Department of Computer Systems University of Sao Paulo Av. Tabalhador Saocarlense, 400, Sao Carlos, SP, Brazil emails: jsilva@icmc.usp.br, joelmir@icmc.usp.br
ABSTRACT In order to convert High Level Language (HLL) into hardware, a Control Dataow Graph (CDFG) is a fundamental element to be used. Otherwise, Dataow Architecture, can be obtained directly from the CDFG. In the 1970s and late 1980s, the Dataow Model was the focus of attention that provided parallelism in a natural form. In particular, dynamic dataow architecture can be generated to produce a high level of parallelism. In this paper, the ChipCow project is described as a system to convert HLL into a dynamic dataow graph to be executed in a dynamic recongurable hardware, exploring the dynamic reconguration. The ChipCow consists of various parts: the compiler to convert the C program into a dataow graph; the operators and its instances; the tagged-token; and the matching data. Some results are presented in order to show a proof of concept for the project. 1. INTRODUCTION A Dataow Architecture is an architecture where a natural parallelism is present. This kind of architecture was rst researched in the 1970s and was discontinued in the 1990s [1, 4, 7]. With the advance of the technology of microelectronics, the Field Programable Gate Array (FPGA) is been used, mainly because of its exibility, the facilities to implement complex systems and the intrinsic parallelism. Thus, the dataow architecture is a topic once more [2, 5], specially because of the recongurable architecture, witch is totally based on FPGAs. On the other hand, a great effort is been realized to covert high level language as a C language into a hardware, in order to help engineering to project his systems using a high level of abstraction as well as a digital logic level. In particular, the ChipCow project is a system where a C programs is initially converted into a Dynamic
Sponsor

Valentin O. Roda, Kelton P. Costa Department of Electrical Engeneering University of Sao Paulo Av. Trabalhador Saocarlense, 400, Sao Carlos, SP, Brazil emails: valentin@sel.eesc.usp.br, kelton.costa@gmail.com
Dataow graph, followed by its execution in a Recongurable Hardware. Its ow diagram is shown in Figure 1. As can be clearly seen in Figure 1, the ChipCow system begins in a host machine where a C program is edited, to be converted into a control dataow graph (CDFG) generating a CDFG object program. The CDFG object program is converted into a VHDL where modules of CDFG are accessed from a data base of VHDL modules. After generating the complete VHDL program, an EDA tool to convert the VHDL program into a bitstream and to download it to a FPGA is used. A dynamic reconguration, present in some FPGAs that provide dynamic dataow execution is used, witch is the purpose of this project.

Fig. 1. The Flow Diagram for ChipCow tool.

acknowledgments for CNPq

A dataow graph is composed by a set of operators interconnected by arcs. Various items of data can be in an arc coming to an operator. When all arcs of an operator are with data partners, it res and send a result to another operator. There are two models of dataow architecture: a static and a dynamic dataow model. In the static model, just one item of data can be in an arc waiting for its partner. A protocol for a static dataow model should be used to maintain just one item of data in an arc. Consequentially the parallelism is limited to one item of data per arc. In a dynamic model,

978-1-4244-3846-4/09/$25.00 2009 IEEE

213

more than one item of data can be in an arc. The protocol for dynamic dataow model also should control each item of data in the arc and their partners, however a tagged-token is used to control the data partners. In this case, the parallelism is present when all items of data can be in the arcs, rightly limited for the size of the hardware to receive these data. In the section 2 the basic structure for ChipCow: the pre-compiler, its operators, and some examples of graphs is presented. In the section 3 the instances model; the taggedtoken format and the iterative constructors are described witch allow several instance of an operator to be executed in the dynamic model of dataow using iterative constructor respectively. In the section 4 the Matching data that identify items of data partners is described. In the section 5 the implementation of the operator and its instances are described. Finally in the section 6 the conclusion and future works are described. 2. THE C PRE-COMPILER FOR DATAFLOW GRAPHS After lexical analyzing, semantic analyzing and code optimization, the code generation, in the C pre-compiler, provide a le with various packets of bits that represent the dataow graph. The format of the packet is described in the Figure 2. As can be clearly seen in the Figure 2, the rst 4-bits of the packet is been used to identify the operator; the second, the thirty and the fourth 5-bits is been used to identify the three inputs (a,b and c) of the operator; nally the sixth and the seventh 5-bits to identify the outputs (s and z) of the operator. This is a generic template for the operator with three inputs and two outputs signal, however there are operators which less than three inputs signals and just one output signal. After compiling a C program, its correspondent dataow graph and the packets of bits for this dataow graph are generated. In the Figure 3, one single example of a While C statement is described. After compiling the While statement, a dataow graph are generated. It is shown in the

of bits represent an arc with no connection signal. Thus, a le with these packets of bits, is a binary representation for a dataow graph extracted from a while C statement in the C pre-compiler.

Fig. 3. A C program converted into a bit stream.

Fig. 2. The bit stream for dataow graph. Figure 3 that each operator has a set of bits to identify its function, as well as each arc has a set of bits to identify its interconnections. In particular, in the left top of the gure has an operator with the code 0001 and its arcs 00000 (value 0), 00001 (value i), 00010 (a control signal) and 00011 (the output signal), corresponding to three input signals and one output signal respectively. The packet of bits for this particular operator can be clearly seen in the rst packet of bits in the Figure 4. The xxxxx in the packet

Fig. 4. The bit stream generated from a C program.

After generating a le of binary representation for a dynamic dataow graph, the C pre-compiler converts the le into a VHDL code. A library with all the operators implemented in VHDL is used for this conversion. The information in the packet of bits is used to construct a VHDL code component for the operator. The set of components are then used to generate the nal VHDL code to be executed in the hardware.

214

3. THE INSTANCES OF THE OPERATORS As the ChipCow is based on a Dynamic Dataow Architecture, an arc can be viewed as a buffer of data and various items of data can be in the buffer waiting for the data partner in an operator. However, a new instance of the operator will be generated for each item of data coming through the arc. It is for this reason that there is no buffer of data in the arcs. Thus, various instances of an operator can be generated, waiting for the items from their data partners. To implement the model of instances, a process to insert and remove the sub-graph in an original graph was proposed. 3.1. The tagged-token Tagged-tokens are used in the item of data to instantiate several execution of the same operator. An example of the application of a tagged-token is the execution of a simple loop in C described in the following algorithm: z=0; for (i=0; i<N; i++) z=z+(x*y); The operator * in the dataow graph can receive N different items of data from x and from y at different times. If there is a new item of data in x, a new item of data in y should appear any time, however if the item of data y is not presented yet, a new instance of the operator * should be created waiting for correspondent y. The same happens if y appears rst. In Figure 6, there is an example of various instances for an * operator. 3.2. Iterative Constructors In order to organize the item of data that goes into a program, function, procedure or loops, the tag needs to be created or modied. Thus iterative constructor operators were implemented, basically to create and adjust the tag for the item of data. In particular, the (while, repeat, for and another types of loop) are statements that synchronize the input items of data in the loop. In this case, the iterative constructor operator denes that a new iteration is beginning for that item of data. The same happens when the item of data goes in a program, function or procedure. The iterative constructor operator informs that a new tag is beginning for that item of data [6]. The specic operators to control the iterative constructor are: new tag manager (NTM); new iteration generation (NIG) and new tag destructor (NTD). A new tag is generated for the NTM operator when the item of data goes in a program, function or procedure. The NIG operator modies a tag generating a new value for the iterations. The NTD operator removes the information created by the NTM operator. In Figure 5, the format of the tag used in a dataow graph

according to the iterative constructor operator is described. As can be clearly seen in Figure 5, the rst 4-bits of the tag are used to identify the activation, which is modied for the NTM operator, which means an item of data is incoming in a program, function or procedure. The second 4-bits of the tag are used to identify the Nesting, which means which level of loop the item of data is. This part of the tag is also modied for the NTM operator. The next 8-bits of the tag are used to identify the Iteration, which means what level of iteration, inside a loop, the item of data is. This part of the tag is modied for the NIG operator. The NTD operator only modies the tag when the iteration has nished, or a nesting loop has concluded or the item of data leaves a program, function or procedure. Finally the last 32-bits, concatenated with the tag, compose the item of data.

Fig. 5. The Format of the Tagged-Token.

4. THE MATCHING DATA In order to match items of data coming into a specic operator, and considering different instances of the same operators, a system was implemented that identies if there is a item of data partner in some instance of the current operator [3]. Therefore, the matching circuit is part of the instance and uses the format of the tag to identify items of data partners. The process begins when the operator receives an item of data. Immediately, all the instances receive the same item of data. This item of data is simultaneously compared to all the items of data already present in each instance of the operator. If there is a match, the instance res and informs the operator which data to send to the next operator. Otherwise a new instance is generated and this item of data is loaded in a buffer in the instance. The intercommunication system has fundamental importance in the matching system, because all the input and output items of data are involved with the matching system. In Figure 6, the different instances and their matching circuits (M in the gure) are described. The variable bitM is used by the operator to deal with the coincidence in the instances. 5. THE IMPLEMENTATION OF THE OPERATOR AND ITS INSTANCES An operator is a complex element in the ChipCow consisting of various parts: the intercommunication system with a specic protocol to send and receive data; matching data to identify data partners for each instance of the operator; the

215

Fig. 6. The Instances with its Matching Circuits and the common variable.

system to generate the instances; and the control to execute these instances. Initially, an ADD operator only for proofof-concept was implemented. A Statechart diagram of the operator is described in Figure 7. The process begins by receiving the astr signal, informing the current operator that it has an item of data to receive from the previous operator and also to test the bitz signal that denes if there is no data waiting to be sent to the next operator. Thus, if the conditions are true, the next step is to observe the incoming data, with its partner bufa[47-32]=bufb[47-32], according to the tag specied in Figure 5. If there is a partner, the operator executes the operation, sends the result zstrb, and acknowledges all the items of data aack,back. When there are no partners, the data is buffered, and no acknowledgements are sent back. The implementation was carried out in an EDA tool from Xilinx ISE 9.2i for Virtex 5 (360 MHz). The operator spent 6ns to deal with all the conditions for the protocol, matching data, and the execution. For this implementation, static dataow architecture was tested, without dynamic reconguration, which will be the focus in the future. 6. CONCLUSION Research to convert High Level Language (HLL) into hardware has put forward various possibilities mainly with the exibility and capacity of the recongurable architectures. A Control Dataow Graph (CDFG) is a fundamental element in this process. Otherwise, a Dataow Architecture, which was the focus in the 1980s, can be obtained directly from the CDFG. In particular, dynamic dataow architecture can be generated in order to produce a high level of parallelism. In this paper, the ChipCow project was described as a system to convert HLL into a dynamic dataow graph to be executed in dynamic recongurable hardware, exploring the dynamic reconguration. The operator, which is the main element in the dataow graph, was implemented, and spent 6ns to execute all the process. The simulation re-

Fig. 7. The Statechart of an Operator.

sults demonstrate the proof of concept for the operator. The next steps of the ChipCow project are to implement the complete model of instances and generate an analysis with benchmarks to verify the impact of this approach.
7. REFERENCES [1] Arvind. Dataow: Passing the token. ISCA Keynote, 2005. [2] A. Capelli. A dataow control unit for c-to-congurable pipelines compilation ow. IEEE Sumposium on FieldProgrammable Custom Computing Machines FCCM04, 2004. [3] A. DEHON. Recongurable architecture for general-purpose computing. Ph.D. thesis, Massachusetts Institute of Technology, 1996. [4] J. B. Dennis. A preliminary architecture for a basic dataow processor. Proceedings of the 2nd Annual Symposium on Computer Architecture, 1975. [5] S. Swanson. Wavescalar. 36th Annual International Symposium on Microarchitecture, 2003. [6] S. e. a. Swanson. Wavescalar. 36th Annual International Symposium on Microarchitecture, 2003. [7] A. H. Veen. Dataow machine architecture. ACM Computing Surveys, 18(4):365396, 1986.

216

Das könnte Ihnen auch gefallen