Design and Realization of High-Performance Universal Radar Signal Processing System

ICSP2008 Proceedings
Design and Realization of High-performance Universal Radar Signal Processing System

Shan-qing Hu Teng Long
Radar Research Lab, Beijing Institute of Technology Beijing 100081, P. R. China E-mail: hushanqing@bit.edu.cn Abstract: Designing high-performance universal radar processing system is the method of solving the problem of increasing task needs of new radar system and that of different arithmetic requiring different system structure. Based on the theoretical analysis on the shared bus type and distributed type parallel structure, and according the characteristic of the radar signal processing , one high-performance universal radar signal processing system was designed and realized. The system has the characteristics of standardization, modularization, scalability, restructure, hybrid parallel processing, and multi-layer interconnection. And through using the system to design and realize typical radar signal processing system, the high-performance and the universality of the system were proved. Key words: radar signal processing, parallel processing, TS201, FPGA I. INTRODUCTIONS With the spread of application fields and the enhancement of function of radar, new system radar puts forward higher requirements on the signal processing system. On account of the lager scale of data and calculation, even if the chip technology could keep high-speed development, single chip still cannot satisfy the needs of radar signal processing[1]. Generally, traditional design of radar signal processing system points at given application situation, first decides applicable arithmetic, and then decides system structure according to the arithmetic and forms the system through all kinds of special processing module. Because the system structure is connected with the arithmetic, the universality is not good enough. In order to fulfill the requirement on high-performance and universality of radar signal processing system, it is necessary to design one high-performance universal radar signal processing system. Comparing with the traditional special radar signal processing system, this high-performance universal radar signal processing system has the characteristics of scalability, restructure and universality. The thesis aims at discussing the method of designing and realizing this kind of high-performance and universal radar signal processing system. II. The Model of Parallel Processing Structure For the sake of high-performance, it needs to adopt multiple processors to carry through the parallel processing. The parallel processing structure directly decides the performance of the parallel processing. The most common two kinds of parallel processing structure is shown in Fig.1 (P(Processor); M(Memory); MB(Memory Bus); NIC(Network Interface Circuitry)). One is shared bus type structure and the other is the distributed type parallel structure. In the shared bus structure, multiple processors P share memory through high speed bus, and every processor could visit the shared memory, IO facility and operate system services equally. It fits for small-scale parallel system, and if the system scale becomes large, the bottle-neck of the bus is serious. And on the other hand the bus is hard to scalable once it is made. In distributed type parallel structure, multiple processing nodes could scalable to larger processing scale through high communication bandwidth and low lingering customized network. There are physical distributed memories on every processing node and data is transferred by message[2].
MB P P MB P P M NIC
... ...
...
Network
P M NIC
bus
IO
Shared bus structure
Distributed structure
Fig.1
the model of parallel processing structure
The purpose of parallel processing is to adopt multiple processors processing the tasks to decrease the processing time, which reflects on two indexes: rate of speedup(S)and parallel efficiency(E).The ratio of speedup of parallel system means the ratio of the parallel arithmetic processing time to the serial arithmetic processing time. And the parallel efficiency is the index about every processor. These two structures could be theoretically measured by the equivalent measurement standard in evaluating the scalability of parallel processing. First, take the shared bus into account. i i Suppose te and to means separately the useful processing time and the additional spending time (including the time of communication, synchronization, and latency time) of number i processor on the parallel system. Suppose the ratio of operation quantity to communication quantity of every processors task is r, which means in average r time operation there is one data needing to be exchanged. Suppose the bus is visited by p processor by turns and tio means the relative time of finishing one bus visiting by the processor, equivalent means the ratio of operation ability to bus visit ability. Generally, the total processing time and the additional time is as follows:
Te' = To' =
p 1 i =0 p 1 i =0
tei
i to
(1)
(2)
Suppose the task is separated equally to p parts, then:
______________________________________ 978-1-4244-2179-4/08/$25.00 2008 IEEE

2254
Te = pte
(3)
At worst, p processors always visit the bus at the same time. Considering the processor obtaining the bus visiting right at last:
through. But it doesnt effects the formula discussed before. Because the most or the average path is one customized n in the regulation network topology system, then the ratio of speedup formula of distributed parallel system is:
To = max(To' ) = p
te tio r
(4)
S=
p tio + 2ntcomm 1+ r
(11)
Tp is the operating time of parallel arithmetic on every

processor, and at worst:
Tp = te + To
(5)
Suppose the task scale W is the calculating quality of best serial arithmetic, namelyW = Te . Then the ratio of speedup:
T p p S= e = = T t T p 1 + o 1 + p io t r e
The parallel efficiency:
(6)
It is obvious that in this situation the linearity ratio of speedup could also be obtained in distributed parallel system. From the theoretical analysis above we could know that the shared bus parallel structure adapts to the program model of shared memory and could carry through slim granularity parallel processing. But the scalability is not good enough, the number of processor is limited and the processing capability of system is limited as well. The distributed parallel structure adopts the mechanism of messages transmitting, which adapts to carry through wide granularity parallel processing and is convenient for large scale system expanding. III. The Design and the realization of the processing system The system of radar signal processing has its particularity. On the one hand, due to the complexity of the signal processing arithmetic, the system of radar signal processing has diversity parallel processing modes. For the wide granularity arithmetic, the distributed parallel structure is convenient. While for the slim granularity arithmetic, due to the need of exchange the data frequently, the shared memory parallel structure is preferred. One the other hand, the radar signal processing system has many kinds of data stream, such as the original data stream (for instance the data stream after AD collecting), the middle data stream (the data stream transmitting between any processing nodes), the timing synchronization signal and controlling data stream. The transmitting bandwidth of these different data stream is varied, so the network matched with these different data stream is required in the system. According the characteristic of the radar signal processing and the theoretical analysis on the parallel processing structure mode, the author designed one high-performance and universal radar signal processing system, which adapts to the mix parallel processing structure and has the characteristics of standardization, modularization, scalability, restructure and multiple layer interconnections. Fig.2 is the structure of the system, which adopt the high-performance floating point DSP chip TS201 and the large scale FPGA. The system adopts cPCI 6U board standard, which integrates two processing nodes. Every processing node is consisted of two pieces of TS201, one piece of FPGA and the SDRAM shared bus. Because shared bus is chosen inside the processing nodes and the number of the processor is not large, not only the share buss advantage of parallel processing could be brought to play, but also the efficiency of the bus would not be effected. Large scale distributed parallel processing system is made up with multiple networks among the processing nodes. At the same time, two PMC boards could be carried on, and the agility of the system is increased through integrating kinds of PMC sub-boards.
E=
S 1 = p 1 + p tio r
(7)
Obviously, the parallel efficiency of shared bus system decreases with the increasing of number p of processor, while in the distributed parallel system, at best there could be two processors exchanging data through their communication network at any time. Suppose the relative time of one communication network transmitting one data is tcomm , and the equivalent is the ratio of processors operating ability to communication networks transmitting ability. At one time, suppose local memory would be visited in every exchange. Then the communication spending:
To =
te (tio + 2tcomm ) r
(8)
Ratio of speedup:
S=
p tio + 2tcomm 1+ r
(9)
Parallel efficiency:
E=
1 tio + 2tcomm 1+ r
(10)
Ratio of speedup and the processing scale makes up linearity relationship and the parallel efficiency is independent of p. Certainly, these discussions are on the assumption that the processors are linked completely, namely any two processors could exchange data directly. In fact, when the number p of processor exceeds the number of communication network, the data exchange spending between two indirectly linked processors has direct ratio with the paths they have gone
2255
System PCI bus J1 J2 PCI bridge Local PCI bus
Node 0 FPGA0 PMC0 CPLD SDRAM TS201
Node 1 FPGA1 CPLD SDRAM TS201 PMC1
whose transmission speed could reach to 3.125Gbps are all integrated inside[5]. These characteristics make the FPGA adapt to realize the transmission and preprocessing of data. On the DSP bus of every processing node, there are two pieces of TS201, one FPGA, 4 GB SDRAM and one CPLD. Processing nodes consisted of two pieces of TS201 shared bus bring the matching quality of processing capability, transmitting capability and memory capability into play fully. The SDRAM controlling machine of the TS201 supports 1GB addressing space. Through the CPLD, the addressing space of SDRAM could be expanded to 4GB, and with the memory capability improving, it fits the occasion of large memory application. V. Multi-layer interconnections Through PCI bus, high speed Link interface of TS201, high speed standard agreement serial RapidIO basing on FPGAs RocketIO physical channel, and the synchronization timing bus realized by CPLD, the processing system comprises different layer networks, which could satisfy the requirement of different type data stream transmission in radar signal processing system. cPCI standard connects the 64 bit system PCI bus with J1 and J2, and the PCI bridge realize the transition from system PCI bus to local PCI bus. Every processing node realize PCI interface through FPGA(FPGA0 and FPGA1), and two processing nodes and two PMC sub-board share local PCI bus and connect the system PCI bus with PCI bridge. This make the system mainly controlling module could realize the controlling on every processing node and PMC sub-board through PCI bus. At the same time, data among every processing node could be exchanged through PCI bus. Owing to the restriction of the PCI bus, only some low speed data exchange could be achieved. TS201 has 4 high-speed link interfaces, which could be used to realize high-speed data transmission among multi pieces of TS201. The four pieces of TS201 couple together tightly by forming one circle link through its own two Link interfaces. In addition, every one link interface of every piece of TS201, connect with the FPGA2, and at the same time, two Link interfaces are defined on the PJ4 of every PMC, four Link interfaces are defined on the J4 of the board. All the Link interfaces are connected to the FPGA2(named as Link Switch). Through FPGA2, different processing nodes, whether among the board, between the board and PMC sub-board or between many boards, could be connected flexibly. At the same time, one of the Link interfaces on each of the two pieces of TS201 inside every processing node connects to the FPGA(FPGA0 or FPGA1)on the bus inside the nodes, and matches the FPGAs outside serial RapidIO interface, thus the exchange between the data from the outside serial RapidIO and the inside data of TS201 was realized. The Link interface has the characteristic of big bandwidth and low time-lapse, so it suits to exchange the original data stream and middle data stream of big bandwidth. Serial RapidIO is the third era network agreement of high speed, point to point and packet switching. Compared with the Link agreement of TS201, it has more perfect agreement definition (including the logic layer, transmission layer, and physical layer), and could exchange the data among the nodes through the modes of message transmission or shared memory. The agreement makes the processing node being more universal, it could not only connect with every processing node in the same type, but also connect with any different type processing node by serial RapidIO interface. Through the RocketIO physical channel of FPGA and FPGA programming, the serial RapidIO agreement could realize. FPGA0 and FPGA1 connect J3 with 4 RocketIO, so the processing node could connect other node with serial RapidIO interface of 4mode through J3. Serial RapidIO network has the character of big bandwidth, and
245
TS201 DSP bus 2 Links RapidIO J3 4 Links J4 J5 FPGA2
TS201 DSP bus 2 Links RapidIO J3
Syn timing
Fig.2
the structure of the processing system
IV. DSP+FPGA shared bus processing node In radar signal processing system, the data processed by the low-level signal arithmetic is so huge that the processing speed is strictly demanded. But the operation structure is relatively simple and adapts to realize by FPGA, thus both the speed and the agility could be taken into account. The character of the high-level processing arithmetic is that the data to be processed is not larger than that under the low-level arithmetic, whereas the controlling structure of the arithmetic is complicated. Correspondingly, it adapts to realize by DSP chip that is in high operation speed and has flexible manner of searching for address and powerful correspondence mechanism[3]. Therefore, every processing node on the processing system designed by the author mainly includes DSP, FPGA, SDRAM and CPLD of shared bus. DSP is mainly used to realize the high-level arithmetic data processing. FPGA is used to realize the nodes external interface and could carry on low-level preprocessing of the data. SDRAM is used to store the data. CPLD is used to realize some accessory logic. DSP chip chosen by this system is the floating-point DSP chip of top quality in current industry. It is the TS201 type that produced by ADI Company. The single piece processing ability is high up to 3.6GFLOPs, the embedded memory is 24Mbit, and the outer bus is 125MHz/64bit, which could support 8 pieces of TS201 with shared bus. There is also SDRAM controlling machine that has 1GB addressing space on the bus, every piece of TS201 has four interfaces of link, and every interface of link could send and receive independently. The highest bandwidth of one link could reaches at 1.2GB/s[4]. All the characteristics make the TS201 adapt to multi-piece expanded and make up one large scale high-performance signal processing system. The FPGA chip is the XC2VP20 among the Virtex II pro series produced by the Xilinx Company. The scale is about 2,000 thousands gates, 1584Kbit RAM, 88 1818bit multiplication machines and 8 RocketIO high speed channels
2256
comparing with the Link interface, it has more perfect agreement controlling, but it has the disadvantage of time-lapse just because of the complicated agreement controlling. Serial RapidIO network is also mainly used to transmit original data stream and some middle data stream of big bandwidth, furthermore, it could supplement the Link network well to structure the system more flexibly. The synchronization timing signal defined on J5, is used to realize the synchronization timing controlling among each processing nodes. These signals connect the CPLD inside every processing node by 245 driving. Every TS201 could operate the CPLD inside the nodes through the interruption, flag or the read-write on the register. The character of double direction of 245 makes every node could not only transmit the synchronization signal but also receive it. All the external interface of the node are connected with J1~J5 those five plugs, then every interface among varies node on the motherboard could be connected. Moreover, the motherboard could be the board using fix structure, and could also be the board using exchanging chip, so all kinds of interconnections could be structured. VI. Experiment Validation Phased array radar and synthetic aperture radar are the new current radar system. In practical application, using the processing system formulated above, the author constructed multiple radar signals processing system of phased array radar and synthetic aperture radar. A of the Fig.3 is one phased array radar signal processing system. This system selects optical fiber to transmit the data between the arrays of phased antennas and processing system. The optical fiber interface board is PMC standard, which could be integrated in the processing system. Every processing module integrates two optical fiber interface boards, one board receive echo wave data from one array and transmitting the data to every processing nodes through the Link Switchs Link interface defined on the PJ4 of every PMC board. Every processing node carry through beam forming, and then transmits sub array beam to the processing module carrying on the task of sub array beam forming through serial RapidIO interface defined on J3. This module is charged with sub array beam forming and other radar signal processing, and the PMC synchronization timing module is integrated in it as well. The PMC synchronization timing module produces the synchronization timing signal of each module in the system and makes every module work in phase. The processing system adopts the mode of data paralleling, namely one node processes one sub arrays beam, which means the increase or the decrease of antenna array could flexibly achieve through that of the processing nodes. B of the Fig.3 is the radar signal processing system of one synthetic aperture radar. The system consists of one high-speed ADC interface module (cPCI board) and multiple processing modules. The processing mode adopts stream parallel mode, namely every processing node finishes one strips real-time imaging. High-speed ADC module and multiple processing modules use serial RapidIO interface or the Link interface to constitute stream line. ADC module transmits the gathered data to the border processing node and the processing node intercept one strips data to go on imaging and transmit other data to the next processing node, such the whole systems stream processing has achieved. So through increasing the number of the processing node, the operation capability of the system could be improved and the system could adapt to real-time imaging under higher differentiating ratio.
Fiber
Fiber
Fiber
Fiber
Syn Timing Board
Interface Interface Board Board
Interface Interface Board Board
Module1 Node Link Switch Node Node
Module N Link Switch Node
Module N+1
(a) ADC Board Node Module1 Link Switch Node Node Module N Link Switch Node
(b) Link: RapidIO: cPCI bus: Syn timing bus:
Fig.3 Phased array radar and synthetic aperture radar signal processing system VII. CONCLUSIONS On the base of analyzing the advantage and disadvantage of the shared bus parallel structure mode and distributed parallel structure mode, the author designed one high-performance universal radar signal processing system according to the characters of radar signal processing system. The system has the character of hybrid parallel processing pattern, multi-layer interconnection, standardization, modularization, scalability, restructure. In practice, using this processing system, matching with corresponding IO module, multiple phased array radar and synthetic aperture radar signal processing system have been structured, and the high-performance and the universality of the system have been validated. REFERENCES [1] Bai J-Y and He Z-Z, The Parallel Technologies of General High Speed Digital Signal Processing, Microelectronics & Computer, 2003, 20(4):32-34. David E. Culler, Parallel Computer Architecture: A Hardware/Software Approach, China Machine Press, Beijing, 2002. Yuan J-Q and Huang P-K, Design of real-time digital signal processing system based on DSP and FPGA, Journal of Systems Engineering and Electronics, 2004,26(11):1561-1563. Analog Device Inc, ADSP-TS201 TigerSHARC Processor Hardware Reference, November, 2004. Xilinx Inc, Virtex II Pro Platform FPGA User Guide, February, 2004.
[2] [3]
[4] [5]
2257

Design and Realization of High-Performance Universal Radar Signal Processing System

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Design and Realization of High-Performance Universal Radar Signal Processing System

Hochgeladen von

Copyright:

Verfügbare Formate

ICSP2008 Proceedings

Design and Realization of High-performance Universal Radar Signal Processing System

Shared bus structure

the model of parallel processing structure

Suppose the task is separated equally to p parts, then:

______________________________________ 978-1-4244-2179-4/08/$25.00 2008 IEEE

Tp is the operating time of parallel arithmetic on every

System PCI bus J1 J2 PCI bridge Local PCI bus

Node 0 FPGA0 PMC0 CPLD SDRAM TS201

Node 1 FPGA1 CPLD SDRAM TS201 PMC1

TS201 DSP bus 2 Links RapidIO J3 4 Links J4 J5 FPGA2

TS201 DSP bus 2 Links RapidIO J3

the structure of the processing system

Syn Timing Board

Interface Interface Board Board

Interface Interface Board Board

Module1 Node Link Switch Node Node

Module N Link Switch Node

(b) Link: RapidIO: cPCI bus: Syn timing bus:

Das könnte Ihnen auch gefallen