Fast Sine Cosine

Fast Sine/Cosine Assembler Subroutine for Phase
Rotation
Introduction
In the DSP signal processing, time varying amplitude and phase of RF vectors are
represented in rectangular coordinates {I,Q} or X = I + jQ in complex notation.
Often we would need to rotate a vector a phase [rad]:
Xrot = (Irot + jQrot)= X * ej = (I + jQ)(cos + j sin ) =

(I cos Q sin ) + j (I sin + Q cos )
or in matrix notation:
I rot cos
=
Qrot sin
sin I

cos Q
Often we would need to compute a phase shift linearly dependant of frequency (phase
offset + cable delay, positive or negative, in which case we evaluate:
= 0 + tdelay = 0 + 2 f tdelay
Implementation of Fast sine/cosine Assembler Subroutine

In the Analog Devices code examples:
http://www.analog.com/processors/processors/sharc/technicalLibrary/codeExamples/appl
icationsHandbook.html
where there is a link to an assembler function collection zip file:
ftp://ftp.analog.com/pub/dsp/210xx/code_examples/apps_handbook_v1/function.zip
which contains an example of a sine/cosine function subroutine written in assembler:
sin.asm. It takes 38 clock ticks to evaluate or a total of 76 clock ticks for computing both
cosine and sine. Have a look at this assembler subroutine, but dont copy too many of the
bad features of this poorly written example!
These code examples were written using pre-VisualDSP++ tools, and they are optimized
for the ADSP-21020 architecture. To port the examples to the more recent SHARC
platforms (ADSP-2106x, ADSP-2116x) it is necessary to create a VisualDSP project and
make the appropriate changes.
We can do a lot faster than this (badly written) example by:
- retain the sign of the phase argument after evaluating it modulo one quadrant
- reduce the argument range from 0 < < + /2 to /4 < < + /4
- In stead of using Taylor series expansion of the sine and cosine, we use a
polynomium with coefficients obtained from a number or linear equations [pretty
straightforward using POLYFIT function in MATLAB] where the error has been
set to zero at strategic points in the argument range: equidistant on an 2 scale,

with a little fiddling to mimimize maximum error. This results in an error which
oscillates within fixed error limits as seen by running the sin_poly3.m and
cos_poly3.m MATLAB scripts. For the case of the 7th order sine polynomium, the
optimized polynomium sine error is below 1.9 10-9 while the Taylor series
produces a maximum error of 3.1 10-7. For the case of the 6th order cosine
polynomium, the sine error using the optimized polynomium is below 3.2 10-8
while the Taylor series produces a maximum error of 3.6 10-6. In both cases more
than a factor of 100 in redution of the error.
-
with reduced argument range and by using optimized polynomium coefficients,

we can reduce the order of the sine polynomium from 15th order (odd terms only)
to 7th order for the sine polynomium and 6th order for the cosine polynomium and
still remain within the error of +/- 3*10-8 (minimum LSB error for mantissa just
below 2) to +/- 6 *10-8 (maximum LSB error for mantissa just above 1) which is
the precision of an IEEE 32 bit float.
Compute cosine and sine polynoiums (both 3rd order polynomiums in 2)
simultanously by using the SIMD dual data and dual arithmetic units of the
ADSP-2116x processor
After the evaluation of sine and cosine in the quadrant 0 ( /4 to + /4 degrees),
swap cosine and sine, and set signs to get correct results in quadrant 1 (+ /4 to
+3 /4 degrees), 2 (+3 /4 to +5 /4 degrees), and 3 (+5 /4 to +7 /4 degrees)
The required polymonials required to obtain IEEE 32 bit float precision are (see analysis
in attached MATLAB routines sin_poly3.m and cos_poly3.m):
COS(X) = C0 + C2*X2 + C4*X4 + C6*X6
SIN(X) = S1*X + S3*X3 + S5*X5 + S7**X7 = X*( S1 + S3*X2 + S5*X4 + S7**X6)
Range of X: /4 < X < + /4
The second way of writing the sine polynomium makes a greater part of it similar to the
cosine polynomium (3rd order in X2) leading to simultaneous SIMD computation.
The polynomium can be rewritten in terms of a fractional quadrant argument Xq with
modified coefficients, which avoids converting back to radians after integer and
fractional quadrant computation:
COS(Xq) = CQ0 + CQ2*Xq2 + CQ4*Xq4 + CQ6*Xq6
SIN(Xq) = SQ1*Xq + SQ3*Xq3 + SQ5*Xq5 + SQ7**Xq7 = Xq*( SQ1 + SQ3*Xq2 +
SQ5*Xq4 + SQ7**Xq6)
Range of X: 0.5 < X < +0.5
The coefficients are:

CQ6 = -2.0427240364907607e-002
CQ4 =
2.5360671639164339e-001
CQ2 = -1.2336979844380824e+000
CQ0 =
1.0000000000000000e+000
SQ7
SQ5
SQ3
SQ1
= -4.6002309092153379e-003
= 7.9679708649230657e-002
= -6.4596348437163809e-001
= 1.5707963267948966e+000
The subroutine EXP_J(X) should put priority on speed and might look something like
this:
Subroutine to compute the Sine and Cosine values of a floating point
input.
Y1=COS(X) and
Y2=SIN(X)
Calling Registers
F0 = Input Value X=[radians]
Result Registers
F0 = Cosine of input Y=[-1,1] real part
F1 = Sine of input Y=[-1,1]
- imaginary part
Altered Registers
F0, F1, (to be determined)
IX (to be determined)
Compatible with register use in fast ISR, and, if possible with
register use in VisualDSP++ c environment such that it can also
be called from c (or make another version of code)
Computation Time
XX Cycles
Suggested implementation flow as follows:

1. If necessary save registers which may be needed and which must be conserved in the
current environment
2. Convert input argument X from floating radians units (360 degrees = 2 ) to
floating quadrant units (360 degrees = 4) by multiplying by a constant 2 /
3. Convert float quadrant to integer part quad_int (round to nearest integer using FIX
instruction with TRUNC bit=0) and fraction quad_frac (subtract float(quad_int) from X,
range -0.5 < quad_frac < +.5
Evaluate quad_int modulo 4 (two last significant bits, 0, 1, 2 or 3)
4. Prepare for polynomial evaluation:

i) set SIMD mode
ii) evaluate quad_frac_squared = quad_frac^2
iii) set index register, loop counter etc (or use in line coding if faster)
5. The SIMD polynomial loop then proceeds as follows (total of 6 clock ticks):
Oper:
Initial
Add next
Mult Xq2
Add next
Mult Xq2
Add next
PEx
Xq2*CQ6
CQ4+ Xq2*CQ6
Xq2(CQ4+ Xq2*CQ6)
CQ2+Xq2(CQ4+ Xq2*CQ6)
Xq2(CQ2+Xq2(CQ4+ Xq2*CQ6))
CQ0+Xq2(CQ2+Xq2(CQ4+ Xq2*CQ6)))
PEy
Xq2*SQ7
SQ5+ Xq2*SQ7
Xq2(SQ5+ Xq2*SQ7)
SQ3+ Xq2(SQ5+ Xq2*SQ7)
Xq2(SQ3+ Xq2(SQ5+ Xq2*SQ7))
SQ1+ Xq2(SQ3+ Xq2(SQ5+
Xq2*SQ7)))
6. This is almost the desired results; only the sine polynomium needs to be multiplied with
Xq to get the sine value:
Xq(SQ1+ Xq2(SQ3+ Xq2(SQ5+Xq2*SQ7))))
7. Restore the the cosine and sine results according to quadrant from appropriate Pex and Pey
registers:
quad_int
0 = 00
1 = 01
2 = 10
3 = 11
F0
COS
-SIN
-COS
SIN
F1
SIN
COS swap results, invert sine
-SIN invert signs
-COS swap results, invert cosine
Try to find a way to use the 2 LSB in a fast computed branch or test MSB to change signs
and test LSB to swap results around.
8. Reset the SIMD mode back to normal, restore saved registers if needed. We are done.
Return.
Two MATLAB functions has been written to evaluate the polynomium approximations
above: exp_jd.m using standard double precision arithmetic and exp_js.m, which is using
single precision polynomium coefficients (necessary to exploit SIMD architecture), single
precision rounding of input argument, a single precision constant for radian to quadrant
conversion (2/ ) and single precision rounding of final result. These functions require that
sin_poly3.m and cos_poly3.m have been run beforehand to generate the polynomium
coefficients in the MATLAB workspace.
A MATLAB script plot_functions.m is available to plot the errors for the two cases in the
argument range from 4 to + 4 as well as printing the numerical expected values out
on the MATLAB console. For large arguments (outside the first quadrant), clearly the single
precision rounding of input argument and the constant for radian to quadrant conversion
(2/ ) dominate the errors of the results.
Quadrant integer, quadrant fraction, cosine and sine outputs
Errors for sine (red) and cosine (blue) using double precision (64 bit floats) computations
Errors for sine (red) and cosine (blue) using single precision (32 bit floats) computations
phi
COS
COS_pol
SIN
SIN_pol
0.00 1.000000000 1.000000000 0.000000000 0.000000000

0.05 0.998750260 0.998750269 0.049979169 0.049979169
0.10 0.995004165 0.995004177 0.099833417 0.099833414
0.15 0.988771078 0.988771081 0.149438132 0.149438143
0.20 0.980066578 0.980066597 0.198669331 0.198669329
0.25 0.968912422 0.968912482 0.247403959 0.247403964
0.30 0.955336489 0.955336511 0.295520207 0.295520216
0.35 0.939372713 0.939372718 0.342897807 0.342897803
0.40 0.921060994 0.921060979 0.389418342 0.389418334
0.45 0.900447102 0.900447130 0.434965534 0.434965521
0.50 0.877582562 0.877582550 0.479425539 0.479425520
0.55 0.852524522 0.852524519 0.522687229 0.522687256
0.60 0.825335615 0.825335562 0.564642473 0.564642489
0.65 0.796083799 0.796083808 0.605186406 0.605186403
0.70 0.764842187 0.764842212 0.644217687 0.644217670
0.75 0.731688869 0.731688917 0.681638760 0.681638777
0.80 0.696706709 0.696706772 0.717356091 0.717356086
0.85 0.659983146 0.659983158 0.751280405 0.751280427

0.90 0.621609968 0.621610045 0.783326910 0.783326864
0.95 0.581683089 0.581683159 0.813415505 0.813415468
1.00 0.540302306 0.540302336 0.841470985 0.841470957
1.05 0.497571048 0.497571141 0.867423226 0.867423117
1.10 0.453596121 0.453596145 0.891207360 0.891207337
1.15 0.408487441 0.408487529 0.912763940 0.912763894
1.20 0.362357754 0.362357765 0.932039086 0.932039082
1.25 0.315322362 0.315322429 0.948984619 0.948984623
1.30 0.267498829 0.267498940 0.963558185 0.963558197
1.35 0.219006687 0.219006717 0.975723358 0.975723386
1.40 0.169967143 0.169967234 0.985449730 0.985449731
1.45 0.120502769 0.120502785 0.992712991 0.992712975
1.50 0.070737202 0.070737265 0.997494987 0.997494996
1.55 0.020794828 0.020794939 0.999783764 0.999783754
1.60 -0.029199522 -0.029199483 0.999573603 0.999573588
1.65 -0.079120889 -0.079120800 0.996865028 0.996865034
1.70 -0.128844494 -0.128844485 0.991664810 0.991664827
1.75 -0.178246056 -0.178245991 0.983985947 0.983985960
1.80 -0.227202095 -0.227201983 0.973847631 0.973847687
1.85 -0.275590247 -0.275590211 0.961275203 0.961275220
1.90 -0.323289567 -0.323289484 0.946300088 0.946300149
1.95 -0.370180831 -0.370180815 0.928959715 0.928959727
2.00 -0.416146837 -0.416146785 0.909297427 0.909297466
2.05 -0.461072691 -0.461072594 0.887362369 0.887362421
2.10 -0.504846105 -0.504845977 0.863209367 0.863209426
2.15 -0.547357665 -0.547357678 0.836898791 0.836898744
2.20 -0.588501117 -0.588501096 0.808496404 0.808496416
2.25 -0.628173623 -0.628173590 0.778073197 0.778073251
2.30 -0.666276021 -0.666275918 0.745705212 0.745705307
2.35 -0.702713077 -0.702712953 0.711473353 0.711473465
2.40 -0.737393716 -0.737393737 0.675463181 0.675463200
2.45 -0.770231254 -0.770231247 0.637764702 0.637764752
2.50 -0.801143616 -0.801143527 0.598472144 0.598472238

2.55 -0.830053535 -0.830053449 0.557683717 0.557683885
2.60 -0.856888753 -0.856888592 0.515501372 0.515501559
2.65 -0.881582196 -0.881582141 0.472030541 0.472030550
2.70 -0.904072142 -0.904072106 0.427379880 0.427379936
2.75 -0.924302379 -0.924302340 0.381660992 0.381661117
2.80 -0.942222341 -0.942222297 0.334988150 0.334988326
2.85 -0.957787238 -0.957787216 0.287478012 0.287478238
2.90 -0.970958165 -0.970958173 0.239249329 0.239249364
2.95 -0.981702203 -0.981702209 0.190422647 0.190422729
3.00 -0.989992497 -0.989992499 0.141120008 0.141120136
3.05 -0.995808325 -0.995808303 0.091464642 0.091464818
3.10 -0.999135150 -0.999135137 0.041580662 0.041580886
3.15 -0.999964658 -0.999964654 -0.008407247 -0.008407216
3.20 -0.998294776 -0.998294771 -0.058374143 -0.058374062
3.25 -0.994129676 -0.994129717 -0.108195135 -0.108195007
3.30 -0.987479770 -0.987479806 -0.157745694 -0.157745525
3.35 -0.978361679 -0.978361726 -0.206901972 -0.206901759
3.40 -0.966798193 -0.966798246 -0.255541102 -0.255541056
3.45 -0.952818215 -0.952818274 -0.303541513 -0.303541422
3.50 -0.936456687 -0.936456740 -0.350783228 -0.350783110
3.55 -0.917754506 -0.917754591 -0.397148167 -0.397148013
3.60 -0.896758416 -0.896758497 -0.442520443 -0.442520231
3.65 -0.873520898 -0.873520911 -0.486786649 -0.486786604
3.70 -0.848100032 -0.848100066 -0.529836141 -0.529836059
3.75 -0.820559357 -0.820559442 -0.571561319 -0.571561217
3.80 -0.790967712 -0.790967822 -0.611857891 -0.611857772
3.85 -0.759399059 -0.759399235 -0.650625137 -0.650624990
3.90 -0.725932304 -0.725932360 -0.687766159 -0.687766135
3.95 -0.690651097 -0.690651178 -0.723188124 -0.723188043
4.00 -0.653643621 -0.653643787 -0.756802495 -0.756802440
4.05 -0.615002377 -0.615002394 -0.788525254 -0.788525283
4.10 -0.574823947 -0.574824154 -0.818277111 -0.818276942
4.15 -0.533208756 -0.533208847 -0.845983701 -0.845983624

4.20 -0.490260821 -0.490261137 -0.871575772 -0.871575534
4.25 -0.446087490 -0.446087658 -0.894989358 -0.894989252
4.30 -0.400799172 -0.400799155 -0.916165937 -0.916165948
4.35 -0.354509065 -0.354509324 -0.935052578 -0.935052514
4.40 -0.307332870 -0.307332963 -0.951602074 -0.951602101
4.45 -0.259388503 -0.259388864 -0.965773061 -0.965772986
4.50 -0.210795799 -0.210795984 -0.977530118 -0.977530122
4.55 -0.161676216 -0.161676213 -0.986843859 -0.986843884
4.60 -0.112152527 -0.112152807 -0.993691004 -0.993690968
4.65 -0.062348515 -0.062348608 -0.998054439 -0.998054445
4.70 -0.012388663 -0.012389044 -0.999923258 -0.999923229
4.75 0.037602153 0.037601963 -0.999292789 -0.999292791
4.80 0.087498983 0.087498985 -0.996164609 -0.996164620
4.85 0.137177112 0.137176827 -0.990546536 -0.990546584
4.90 0.186512369 0.186512277 -0.982452613 -0.982452631
4.95 0.235381443 0.235381067 -0.971903069 -0.971903205
5.00 0.283662185 0.283661991 -0.958924275 -0.958924353
5.05 0.331233920 0.331233919 -0.943548669 -0.943548679
5.10 0.377977743 0.377977490 -0.925814682 -0.925814807
5.15 0.423776818 0.423776716 -0.905766641 -0.905766666
5.20 0.468516671 0.468516320 -0.883454656 -0.883454800
5.25 0.512085477 0.512085319 -0.858934493 -0.858934581
5.30 0.554374336 0.554374337 -0.832267442 -0.832267404
5.35 0.595277548 0.595277309 -0.803520156 -0.803520322
5.40 0.634692876 0.634692788 -0.772764488 -0.772764564
5.45 0.672521802 0.672521532 -0.740077310 -0.740077615
5.50 0.708669774 0.708669603 -0.705540326 -0.705540478
5.55 0.743046441 0.743046463 -0.669239857 -0.669239879
5.60 0.775565879 0.775565684 -0.631266638 -0.631266892
5.65 0.806146805 0.806146741 -0.591715581 -0.591715693
5.70 0.834712785 0.834712505 -0.550685543 -0.550685883
5.75 0.861192417 0.861192286 -0.508279077 -0.508279264
5.80 0.885519517 0.885519445 -0.464602179 -0.464602232

5.85 0.907633279 0.907633126 -0.419764018 -0.419764340
5.90 0.927478431 0.927478373 -0.373876665 -0.373876810
5.95 0.945005369 0.945005238 -0.327054815 -0.327055246
6.00 0.960170287 0.960170269 -0.279415498 -0.279415727
6.05 0.972935278 0.972935319 -0.231077788 -0.231077850
6.10 0.983268438 0.983268380 -0.182162504 -0.182162851
6.15 0.991143940 0.991143942 -0.132791909 -0.132792071
6.20 0.996542097 0.996542096 -0.083089403 -0.083089843
6.25 0.999449418 0.999449432 -0.033179217 -0.033179469
6.30 0.999858636 0.999858618 0.016813900 0.016813837

Fast Sine Cosine

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Fast Sine Cosine

Hochgeladen von

Copyright:

Verfügbare Formate

Fast Sine/Cosine Assembler Subroutine for Phase

Xrot = (Irot + jQrot)= X * ej = (I + jQ)(cos + j sin ) =

Implementation of Fast sine/cosine Assembler Subroutine

set to zero at strategic points in the argument range: equidistant on an 2 scale,

with reduced argument range and by using optimized polynomium coefficients,

The coefficients are:

Suggested implementation flow as follows:

4. Prepare for polynomial evaluation:

Quadrant integer, quadrant fraction, cosine and sine outputs

0.00 1.000000000 1.000000000 0.000000000 0.000000000

0.85 0.659983146 0.659983158 0.751280405 0.751280427

2.50 -0.801143616 -0.801143527 0.598472144 0.598472238

4.15 -0.533208756 -0.533208847 -0.845983701 -0.845983624

5.80 0.885519517 0.885519445 -0.464602179 -0.464602232

Das könnte Ihnen auch gefallen