Euclidean Distance

(Published in the Periodical of the VLSI Society of India VSI VISION Vol 1, Issue 2, 2005)
Euclidean Distance Computation Algorithm for QAM Applications

*Prajnanam Project Team*, S S Mahant Shetti, Senior Member, IEEE
Abstract: The Quadrature Amplitude Modulation (QAM) scheme invariably is used in digital communications, and is in need of faster algorithms to encode the sent informations. There are several processor based algorithms which work on different principles like sorting, Man-Hayden Distance computation etc, which are used to trace the constellation point in the diagram to retrieve the information. This paper proposes a very simple algorithm based on Euclidean distance computation approach that takes a small silicon area for implementation. The paper presents clearly the way in which engineering approximations can be made to get a simple, fast combinational logic based algorithm to implement the Euclidean distance computation. Simulated results using spice and logic demonstration using excel generated plots are presented. Index Terms: Combinational logic, CMOS, Custom LSI, Euclidean Distance, VLSI Techniques. INTRODUCTION
he digital communication predominantly uses quadrature amplitude modulation technique [1][2] for coded signal transmission. The digital modulation scheme works with a demodulator that recovers the original information from the received signal and down convert the RF signal. The QAM scheme uses two orthogonal signals and scaling these signals appropriately, will modulate the waves into different magnitudes and phases which are represented graphically as points in the constellation diagram as shown in fig(1). Thus it is a usual practice to identify the process as 16, 32... 256 state QAM. The cosine wave and the sine wave modulation factors being the Cartesian coordinates in the constellation diagram, which now represent the quality and distortion of a signal received. The in-phase and quadrature component are so modified such that the carrier wave experiences both amplitude and phase modulations. The one of the signal gets magnified and other signal may be attenuated. In effect the on demodulation the scaling factors obtained will clearly describe the phase angle and amplitude distortions of the in-phase and quadrature signals. This then tends to shift the point in a definite locus in the constellation diagram. Distance between received signal point and the intended signal point is a measure of noise added to the signal. Large Scale circuit Integration designs based on QAM technique are frequently used in equipments such as, videophones, mobiles and in several defence applications. With the noise affected during transmission, the received point looks in a locus around the intended signal point and with large noise, may find a place anywhere in the constellation diagram. To trace the information contained in the received signal, it is first necessary to identify the intended point in the constellation diagram and then encode the information from the same. Several algorithms are developed by the various researchers like sorting, Man-Hayden Distance computation etc for identifying the intended signal point. The signal constellation for 16-state QAM consists of Square lattice of 16 message points. The shortest distance estimation among the received signal and one of the constellation points allow the algorithm to identify the highest probable likely point which the received signal intends. The distance finding process has been a very time consuming and hence few algorithms [3] are being developed and popularly used. This paper proposes a simple fast digitally implementable solution to estimate the shortest Euclidean distance so that a combinational logic based implementation scheme can be developed. This scheme uses minimum silicon area, to yield high speed and performance. Then QAM receiver can be implemented with high-speed and less packaging density circuitry to extract the coded information at highest rate possible.
Quadrature
0000 0100
0001 0101
0011 0111
0010 0110 Inphase 1110 1010
1100
1101
1111
1000
1001
1011
Fig.(1) 16-State Constellation Diagram W ith Gray Codes
This paper is an outcome of the Custom LSI 2005 workshop organized jointly by the VLSI Society of India and Karnataka Micro Electronic Design Centre (KarMic) Manipal during June 06th to 20th 2005 at KARMIC Campus Manipal. The Prajnanam project team consist of 8 No. of UG Students, 5 Nos. of PG Students and 10 Nos of Teaching Faculty from Various Engineering colleges across India. Wreetabratakar , Jagannath, Balagopal G, Ankit Bhansali, B Shyam Sundar Reddy, Anshat Singhal, B Pavan Kumar Reddy, Siddharth A Moghe are the team of UG Students involved in Leaf Cell generation. Naufal Ashiq Kukkady, Prabhu G, Mehta Darshan, Kiran Joseph and Rana Mukherji are the group involved in Cell Integration and Testing. R. Sakthivel, Amit Prakash Singh, S S Kerur, L Krishnananda, M J Shanthi Prasad, Harish Padiyar U, Vijay Prakash A M, C Y Gopinath, Kalpana G Bhat, Sandra D'souza is a faculty team involved in control signal generation. Email: custom_LSI@yahoogroups.com S S Mahant Shetti is the Chairman and CEO of Karnataka Micro Electronic Design Centre (KarMic), Manipal, Karnataka India. Email: mahant@karmic.co.in
Development of algorithm The Euclidean distance computation using square root algorithm, between two points in Cartesian co-ordinates can be expressed as
d = (rx S xn ) 2 + (ry S y n ) 2
th
(1)
Where rx and ry are the co-ordinates of the received signal and Sxn and Syn are the co-ordinates of the n constellation point in two dimensional space. The distance expression can then be written as
b d = a 2 + b 2 = a (1 + , a Where a b , and a = (rx - sxn), b = (ry - syx).

d = max(a , b) (1 + ( min(a , b) 2 ) max(a , b)
(2)
(3)
This calculation process needs a adder, divider, multiplier and a square-rooter circuit, which intern needs large silicon area and also process is time consuming. Therefore in high speed, low cost implementations, computation process algorithm play a major role. On the other hand the Euclidean distance can be visualized as the hypotenuse of the triangle with sides a and b. In the right angled triangle the length of hypotenuse is a function of the two sides. To understand clearly a graph of distance for different values of the sides a and b (preferably arbitrarily chosen values) of the triangle, ranging over the possible limits, is drawn as shown in fig (2A). In QAM applications, the distance between the constellation points depends on scaling factors used for modulation. The distance graph is approximated as a set of ordered straight line segments [4] with slope of the straight lines are tuned such that they are digitally implementable. Lesser the straight line segments require lesser number of hardwares to implement. The distance plot reveals that, the distance function can be best approximated using two straight lines segments as d = max(a, b) If min (a, b) <<max (a, b)/2 (4)
max(a , b) 4 It is very interesting to note that the distance computation just requires only one adder and a shift register. This can be implemented using the combinational logic circuits and hence definitely faster. The error with this approach is to a maximum of 6.7% as seen from fig (2B). A better distance function approximation as shown in fig (3A), is attempted to make this as an acceptable solution, wherein the distance is expressed as max(a , b) d = max(a , b) if min(a , b) otherwise 4 max(a , b) min(a , b) max(a , b) 4 d = max(a , b) + ( ) If min(a , b) f (5) 2 4 This implementation also requires only adders and shift registers, and interestingly the maximum error has reduced to 2.98%, as shown in the of fig.(3B). For the QAM application this is an acceptable accuracy.
d = max(a, b) +
min(a, b) 4
If min(a , b)
Normalised Dista nce

1.6 1.4 distance a ctual& approx. 1.2 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 m i n/ max ratio 1 1.2
Fig (2A) Distance graph with first Approximation

0.04 0.03 0.02 0.01 0 - 0.01 0 - 0.02 - 0.03 - 0.04 - 0.05 - 0.06 - 0.07 Error v/s m in( a,b)/Max(a,b)
Error
0.2
0.4
0.6
0.8
1.2
min/m ax ra tio
Fig (2B) Error graph with first

ex act( blue) ,A pp ro x(R ed) 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 0.5 Min/max r atio 1 1.5 Nor malised Distance Vs min/max
Fig (3A) Distance graph with second Approximation

Error Plot
4.00% 3.00% Error 2.00% 1.00% 0.00% 0 -1.00% m in/m ax ratio 0.5 1 1.5
Fig (3B) Error graph with second Approximation
To facilitate digital implementation, the scaling factors are binary represented so that with larger bits the quantization error reduces. However, error margin in approximation can be reduced with a three segment curve fittings at the cost of few more hardwares on the silicon. It is very much necessary to minimize the hardwares, to achieve highest speed of distance computations and there by encode the information sent along the QAM signals. This issue is just a trade off between high speed at low cost and with reasonable accuracy and quality at low speeds with higher cost. To achieve high speed data encoding, two segment approximations would be reasonable. The proposed algorithm needs to be tested for large set of integer numbers. When working with ADC outputs the data length is crucial, to achieve very low error margins. A ten bit data representation was used to reduce error due to the quantization and approximation. Use of the unipolar ADC allows visualizing all the four quadrant points as in a single quadrant. The MSB bit of the ADC output gives the information of the sign bit thereby information of the points lying in a particular quadrant of constellation diagram. I.e. The sign bits (0, 0) of the two numbers of unipolar ADC represent the fourth quadrant, (0, 1) represent second quadrant, (1, 1) first quadrant and (1, 0) the fourth quadrant. Where as the other nine bits of two ADC outputs represent the two orthogonal wave signal strengths, ie the scaling factors in digital domain. More clearly, (1111111111, 1111111111) represent maximum positive scaling factor used and constellation point belongs to first quadrant. Similarly (00000000000, 0000000000) represent highest negative scaling factor and belong to fourth quadrant constellation diagram respectively. Therefore the locus of the points for absolute value of the highest scaling factor is then (111111111) in both axes of constellation diagram. The algorithm developed through approximation is tested for binary numbers to check for quantization error. The digital rounding of approximations introduced errors are very minima to the extent of one bit LSB, which is only 2-10 of the maximum signal quantity. I.e. The quantization error is insignificant, is then total error is unaffected. The section below discusses the development of functional diagram to implement the Euclidean distance computation is between the points.
I. Development of FUNCTIONAL DIAGRAM
A functional diagram to facilitate the computation of the distance digitally is developed using the improved low error curve fitting approximation algorithm, which uses only adder circuits [5] is shown in fig. (4). The process of computation can be explained using four phases.
Phase1: The ADC provides the coordinates of the received signal using which the distance needs to be found wrt the four points of the identified quadrant. Using the sign bits, the databank sector gets activated and provides coordinates of four points sequentially to the adder circuit to perform the subtraction process one after the other, as each one of them is processed in subsequent clock period. In
y yn xn and , so that, the two sides of the triangle a and b other words, a parallel structure is duplicated to compute x can be obtained simultaneously. The result of these computations are stored and passed on to the next phase at the negative clock edge. If the carry generated is negative, the result will appear in the twos complement form which need to be converted back to the absolute difference. To facilitate this, XOR gate block has been used with sum bit as one input and carry generated as the other input to ensure absolute difference between the co-ordinates only will reach the two registers as the sides of the triangle.
(r S )
(r S )
X-Data from Xo -Data from Y-Data from Yo-Data from ADC1 Data Bank Data Bank ADC1
Subtracter
Carry
Subtracter
Carry
Phase-I
XOR Gates
XOR Gates
A-Register
B-Register
Fast Carry Generator

Carry
2 No. of 2 x 1 Multiplexors
Phase-II
Max-Register
Min-Register
7 Lines
9-Bit Ripple Carry Adder
9-Bit Subtracter
8 Lines
Phase-III
2x1 Multiplexors
Carry
Result Register
D Latch
Final Result
Phase-IV
Carry
Fast Carry Generator
Fig (4) Functional diagram to compute Euclidean distance
Phase2: The second phase operation begins with sorting out maxima and minima, so that, they can be comfortably stored in two numbers of nine bit registers and can then be passed on to Phase3, where distance computations are made. The lines connecting the input pins of min and max D flip-flop based registers to the output pins of A and B registers are controlled by transmission gates operated using the carry generated by the fast carry generator circuits. Phase3: In this phase, the Euclidean distance computations are made as per the above algorithm. Then the first step is to check for whether min (a, b) is less than max (a, b)/4. To obtain max (a, b)/4 seven MSB lines are tapped from the max register output and then fed to seven LSB lines of one data input of the subtractor . The next two MSB bits are grounded to ensure that the number is divided by 4. Result of subtraction is added to the max (a, b), if carry is set to 1, otherwise max (a, b) itself will acts like distance and then it is passed on to result register to proceed with the fourth phase. Phase4: The final result register is set to (111111111) as an initial distance. When the first distance i.e. the distance just computed is compared with the final result register preset value and if carry produced is zero, the result register content will be pushed into the final result register through a D latch activated using carry signal generated indicating lowest distance. During the second clock cycle the second distance computed is also compared with the final result register content. If second distance is smaller then it will be pushed into the final result register and if not the previous distance will be retained. The above
explained process needs four clock cycles to complete the shortest distance computation. As discussed above, in a 16-state QAM application, the minimum distance estimation needs at least four such computations to identify the constellation point. To speed up the process a pipelined architecture can be implemented by sending the Cartesian co-ordinates of the constellation points, one after the other from the presorted data bank to the Euclidean distance computation block during every subsequent clock period. Thus the process needs seven clock cycles to finish four distance calculations and output the shortest distance. Hence it is good to set the final result register content after eight clock cycles. To facilitate this, a negative edge clock signal of one eighth of the duration of the system clock has to be applied. To provide safe time margins between the ADC outputs, the ADC was offered a start conversion signal using one fourth of the system clock, so that the next ADC data comes into the Euclidean distance block only after four clock cycles. This shows that, for every four clock cycles the new received data is pumped in and after every eighth clock cycle from this point, the final shortest distance result comes out.
IV. Control Logic.
The process of finding the minimum distance between the received signal point and four points in the selected quadrant of the constellation diagram needs eight clock cycles. As ADC conversion time and hence the digital data availability being asynchronous wrt the distance computation block, it should be latched. Also, there should be provision for preserving the Cartesian co-ordinates of the entire 16 points of the constellation diagram. Keeping in view these requirements of the entire system, control logic has been developed. It consists of three blocks, namely data bank, selector line generation and encoded signal output arrangement. The functionality of each of them has been discussed below. Control Signal generation :The ADC outputs being asynchronous and will be available at the output line of the ADC chip. The signal available from ADC indicates conversion just got over, on the end of conversion line (EOC). A start of conversion signal must be generated to keep the ADC ready for the next conversion. The control logic starts with the EOC lines, which indicates the completion of data conversion from two numbers of ADC. The converted data is of 10 bits long. These data lines are latched using D latches enabled using EOC1 & EOC2 signals. Then the 9 data bits of each of the ADC are to be passed on to the D flip-flops which works with the positive edge of one fourth of the system clock signal. This is to ensure that ADC output data is preserved for at least 4 clock cycles and does not change in between. Similarly, the sign bits of these data lines are also latched and then passed on to the decoder stage through a positive edge triggered D flip-flop supplied with system one fourth of system clock signal. These two sign bits are used to enable a cluster of a data bank, which has Cartesian coordinates of four constellation points. Also as mentioned earlier, the minimum distance computation will be completed and is passed onto the output lines only after 8 clock cycles, it is necessary to generate and provide control signals like Final register Preset line, One eighth of system clock line, SC1, SC2 line etc. Once the set of four constellation point data are passed on through the pipelined process, ADC need to be kept ready for the next data output. This needs the signals SC1 and SC2 to be generated and supplied ADCs. This is implemented using AND gate with one input being one
A D C 1 1 0 -L ine s
eighth of system clock and other being
DLatch
DFF
inverted Load/ Shift pin.
9 bits
DLatch DLatch
9 Lines
DFF
9-Line Data to Euclid Block

2X1
EOC1
EM1
DFF
A D C 2 1 0 -L ine s
DLatch DLatch
DFF
2x4 Decoder 9-Line Data to Euclid Block

D FF
EM2 EM3 EM4
DFF
EOC2
9 bits
DLatch
9 Lines
DFF
System CLK
T -FF T -FF T -FF
Preset to Final Result CLK/8 CLK LEN1
SC1 SC2 LOAD/SHIFT LX LY

2X1
2x4 Decoder
LEN2 LEN3
Fig (5) Schematic of the Control circuit
Data Bank Circuit : The sign bits of the ADC are used in generating enable signals to activate the data paths from one of the four clusters to the Euclidean block at a time depending upon quadrant being selected. The register banks form an integral part of the data bank, which consists of 9 bits for Xi bank, 9 bits for Yi bank and 4 bits for storing encoded QAM information. In other words, a stack of 16 rows 22-bit registers are used to store all the information belonging to the 16 QAM constellation points. To Databank has to be initially loaded by setting the Load/ Shift line HIGH and in particular LX and Ly lines facilitate loading Xi and Yi coordinates. This selection process is implemented using a separate Decoder. Data entry data is done loaded using the Parallel-in pin pushed up. This process looks similar to pushing the data from the bottom and shifting vertically up. Thus the process takes 48 clock cycles to load Xi, Yi and corresponding gray codes. The data is pushed up at positive edge of the system clock. A 9-bit wide data path is provided for parallel loading the Xi data, Yi or gray codes (lower 4 LS Bits) into the register banks.
EM1
Y -B U S F r o m D a ta B a n k
DFF
X0-Bank
DFF
DFF
Y0-Bank
DFF
D- QAM FF
DFF
DFF
X-Bank
DFF
DFF
Y-Bank
DFF
DFF
QAM
DFF
EM2
DFF
X-Bank
DFF
DFF
Y-Bank
DFF
D- QAM FF
DFF
DFF
X-Bank
DFF
DFF
Y-Bank
DFF
DFF
QAM
DFF
EM3
DFF
X-Bank
DFF
DFF
Y-Bank
DFF
D- QAM FF
DFF
DFF
X-Bank
DFF
DFF
Y-Bank
DFF
DFF
QAM
DFF
EM4
DFF
X-Bank
DFF
DFF
Y-Bank
DFF
D- QAM FF
DFF
System CLK
DFF
X-Bank
DFF
DFF
Y-Bank
DFF
DFF
QAM
DFF
LEN1 LEN2 LEN3 9 L in e s to l o ad D a ta F i n al M i n i . D i s ta n c e to O / P Fig. Register Bank to store Data
Load/Shift
OE
Fig (6) Circuit schematic of data storage arrangement
Q A M E n c od e d -B U S to O / P
X -B U S F r o m D a ta B a n k
QAM encoded Data output: The constellation points are Gray coded (each 4-bit long for 16state QAM) usually and are to be made available at the output of the IC once the point in the diagram is identified. To facilitate this process and to get synchronized with the distance computation process, a separate data bus (4-bit wide) passing through 4-bit registers are designed. These registers are negative edge triggered using system clock.
QAM Data output
DFF DFF DFF DFF
DFF DFF DFF DFF
C=0 from Phas e 4 of Euclid Block

DFF DFF DFF DFF
DFF DFF DFF DFF
DFF DFF DFF DFF
CLOCK
Data From QAM Data Bank
Fig (7) 4-Bit bus arrangement for Encoded

4 Pins QAM output 10 Lines from ADC2 9-Bit Dis tance O/p
Euclidean Dis tance computation Block
Encoded Signal Data Bus Data Bank Block
VDD
VSS
10 Lines from
ADC1
Control Signal Block OE
EOC1 EOC1 SC1
SC2 CLK
Lx
LY
Fig (8) Layout of all internal blocks
V.
System Integration
The Euclidean distance computation block and control signal generation block are integrated. Also with the objective of custom LSI design floor planning and routing are tried out. All the basic Leaf Cells developed where re-designed and resized as well. The system was integrated for one bit level testing and nine bit level testing. The workshops basic objective was to give a feel of custom LSI designs to the participants. Therefore it was targeted to just complete the algorithm development, spice simulation and layout practice. The basic leaf cells used are all developed using magic 7.0 with each one of them functionally simulated using spice3 tool. The basic cells used are inverter, 2-input NAND gate, 3- input NAND gate, 2X1 MUX, XOR gate, One bit full adder, 2X4 Decoder, Fast carry Generator, Transmission gate, D flip-flops, D latch, T- flip flop, AND gate etc. These cells are then cascaded to get entire functionality like in any custom layout design. The cell design being initially built from scratch are then resized as per the circuit
requirements. The resizing has been done to ensure the loading effects are minimized. Some of the cases the cell layouts are rearranged to facilitate easy routing of the whole circuitry.
VI. Results
The results of spice simulation are presented for demonstrating databank initial loading, and control signal generation in figures 9 and 10. Certain facts about the project implementation are provided in table (1.0). These results prove the concept as fast, simple, easily implementable. The engineering approximations made are very reasonable in the application discussed.
Table (1.0) Distance computation Implementation - Facts
EUCLIDEAN DISTANCE COMPUTATIONFacts 1 Total Number of Transistors 30000 (Approx.) 2 Area of Euclidean Block 2.0414mm2 3 Area of Control Signals 4 Area of Data storage 0.61mm2 0.43mm2
VII. Acknowledgement
The project team collectively acknowledges the contributions and moral support extended by the Karmic Manipal Engineers team, in completely executing the assigned task. The kind of discussions raised during the project period has given enough innovative thoughts in every participant. Karmic Manipal campus made it possible to strongly believe that using free softwares tools available like magic and spice good confidence can be developed.
VIII. Conclusion
The workshop organized jointly by Karmic and VLSI Society of India at Karmic campus Manipal is to expose the engineering college faculty and students to the chip design activity. The project assignment of algorithm development and implementation Euclidean distance computation has been presented here as a paper. The concept has been proposed with excel generated plots along with spice simulated results to show the process is doable. However time constraints in workshop, background of participants acted as few constraints. All the participants collectively feel that this implementation can be proved to be industry usable with little more involvement.
Fig (9) different control signal for entire process
Fig (10) Initial Data loading operation into the

REFERENCES
[1].Simon Haykin, Digital Communications, 2001, John Wiley & Sons Publications [2].Proakis, Digital Communications: 2001, John Wiley & Sons Publications [3]. Marina L. G and Muhammed H A, Two algorithms for computing Euclidean distance transforms. [4] .Dr. Mahanth Shetti, et al. US Patent titled System and method for approximating nonlinear functions 22nd november 1994. [5]. Rabey, Digital Integrated Circuits Pearson

Euclidean Distance

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Euclidean Distance

Hochgeladen von

Copyright:

Verfügbare Formate

(Published in the Periodical of the VLSI Society of India VSI VISION Vol 1, Issue 2, 2005)

Euclidean Distance Computation Algorithm for QAM Applications

0010 0110 Inphase 1110 1010

Fig.(1) 16-State Constellation Diagram W ith Gray Codes

b d = a 2 + b 2 = a (1 + , a Where a b , and a = (rx - sxn), b = (ry - syx).

Normalised Dista nce

Fig (2A) Distance graph with first Approximation

Fig (2B) Error graph with first

Fig (3A) Distance graph with second Approximation

Fig (3B) Error graph with second Approximation

Fast Carry Generator

9-Bit Ripple Carry Adder

Fast Carry Generator

Fig (4) Functional diagram to compute Euclidean distance

eighth of system clock and other being

inverted Load/ Shift pin.

9-Line Data to Euclid Block

2x4 Decoder 9-Line Data to Euclid Block

EM2 EM3 EM4

Preset to Final Result CLK/8 CLK LEN1

SC1 SC2 LOAD/SHIFT LX LY

Fig (5) Schematic of the Control circuit

LEN1 LEN2 LEN3 9 L in e s to l o ad D a ta F i n al M i n i . D i s ta n c e to O / P Fig. Register Bank to store Data

Fig (6) Circuit schematic of data storage arrangement

DFF DFF DFF DFF

DFF DFF DFF DFF

C=0 from Phas e 4 of Euclid Block

DFF DFF DFF DFF

DFF DFF DFF DFF

Data From QAM Data Bank

Fig (7) 4-Bit bus arrangement for Encoded

Euclidean Dis tance computation Block

Encoded Signal Data Bus Data Bank Block

Control Signal Block OE

EOC1 EOC1 SC1

Fig (8) Layout of all internal blocks

Fig (9) different control signal for entire process

Fig (10) Initial Data loading operation into the

Das könnte Ihnen auch gefallen