Sie sind auf Seite 1von 7

Project Report

Project Objectives
This project aims to develop basic concepts in assembly language through the
implementation of calculating a weighted sum of a k-dimensional weight vector w and
a k-dimensional input vector x. It also introduces how assembly language functions
can be integrated with the C language.
Part 1
The first part of the project requires the computation of a weighted sum governed by
the equation
1
k
n
St WnXn

.
The calling function in C programme is given by asm_wsum (int* w, int* x, int k) in
main.c. The 3 parameters that have to be passed to the asm_wsum are the starting
addresses of the memory locations of w and x, as well as the dimensional variable k.
These parameters are passed to registers R0, R1, and R2 respectively.
The implementation of our asm_wsum is as follows: (You can refer to Appendix for
the flowchart)
asm_wsum.s
PUSH {R3,R4,R5}
MOV R4,#0

LOOP: LDR R3,[R0],#4
LDR R5,[R1],#4
MLA R4,R5,R3,R4
SUBS R2,R2,#1
BNE LOOP
MOV R0,R4

POP {R3,R4,R5}
bx lr

In our implementation, the w
n
and

x
n
elements will be loaded to registers R3 and R5
respectively. Register R4 is used to store the accumulated sum of w
n
x
n
.
A PUSH instruction is first used to save all the contents that are originally stored in
the registers R3-R5 on a stack. The register R4 was then cleared to 0 by a MOV
instruction since #0 satisfies the requirement of Operand2.
The w
n
and x
n
elements were loaded to registers R3 and R5 using post indexed
addressing. The starting addresses w and x arrays obtained from R0 and R1 are
used for memory access. The offset value of #4 is then written back to R1 and R0 so
that the next element in the arrays can be accessed in the next loop. As a word in
ARM Coretx_M3 is 32bits long and the memory locations are byte-addressable, we
have to increment the value by 4 to access the next w and x element in the memory.
The weighted sum w
n
x
n
is computed and then added to the accumulated sum stored
in R4 by the instruction:
MLA R4,R5,R3,R4
The value of k in register R2 which serves as a counter to monitor the number of
times the accumulated sum has been computed will be subtracted by 1 after each
computation. The instruction SUBS is used instead of SUB so that the condition flag
can be updated to determine if branching should be carried out.
When R2 is not equal to zero, the Z flag will be set to 0. The asm_wsum function
branches to the location indicated by the LOOP label since branching is executed
when Z=0 using the BNE instruction.
After executing 6 loops, the MOV instruction moves the accumulated sum stored in
R4 to R0 as the return value of the assembly language function will be passed back
to the C programme through the R0 register.
Finally, the POP instruction is used to restore the stored contents in the stack back to
their original registers R3-R5. The assembly programme ends with BX LR to return to
the calling C programme.
When we first compile the skeleton programme, the big integer value is 268468124
which corresponds to 0x10007F9C in hexadecimal notation. This is the starting
address of w array because it is passed to and returned from the assembly language
function through the R0 register, the content of which is not altered in the skeleton
programme.
Part 2
The second part of the project requires the computation of the Weighted Moving
Averages (WMA) of floating point numbers w and x, governed by the equation S
t
=
w
n
x
t+1-n
k
n=1
, where tk, k=6 using two methods. The first method implements the
computation by a combination of C program statements and the asm_wsum function
developed in Part1 while the second method uses pure C program statements.
It is also required to use msTicks counter to determine the time taken (in us) to
compute S
t
for t=6 to33.
The skeleton program main.c defines three arrays of the type float, namely wf, xf and
s. The array wf contains the weights w in reverse order. The array xf contains the
input vector x with xf[t-1] being the value of x
t
for t=1 to 33. The array s is used to
store the S
t
values, with s[t-6] being the value of S
t
for t=6 to 33.
The expansion of S
t
for t=6 is w
1
x
6
+w
2
x
5
+w
3
x
4
+w
4
x
3
+w
5
x
2
+w
6
x
1
. By inspection
and induction, the first input vector x shifts down the xf array by 1 for each t. In
addition, since weights w are stored in array wf in reverse order, i.e. wf[6-n] is w
n
,
w
n
x
t+1-n
is expressed as wf[6-n]*xf[t-n], for t=6 to 33. If (6-n) is denoted by m and t is
in the range of 0 to 27, then it can be written as wf[m]*xf[t+6-n], that is, the equivalent
expression is wf[m]*xf[t+m].
The implementation of our main.c using the first method is as follows:
t i mer =msTi cks;
for( k=0; k<6; k++)
{
wf 1[ k] =wf [ k] *10000;
}
for( t =0; t <33; t ++)
{
xf 1[ t ] =xf [ t ] *10000;
}
for( t =0; t <28; t ++)
{
s[ t ] =asm_wsum( wf 1, xf 1+t , 6) *0. 00000001;
}
t i mer 1=msTi cks- t i mer ;
printf( " t i me used f or par t 2 asmver si on i s %d us\ n" , t i mer 1) ;
for( t =0; t <28; t ++)
{
printf( " s[ %d] i s %l f \ n" , t , s[ t ] ) ;
}
The calling function requires integer parameters. However, wf and xf elements are of
type float. Therefore, we multiply wf and xf by 10000, respectively, and store the
results into two new integer type arrays wf1 and xf1. This converts elements in wf
and xf to integer type.
To compute S
t
for t=6 to 33, the function asm_wsum (int* wf1, int* xf1+t, int k) is
called for each value of t. The first parameter is the starting address of the array wf1.
The second parameter is the starting address of six consecutive elements in the xf1
array. It corresponds to the address of the element xf1[t], where t is in the range of 0
to 27. It can be equivalently represented as xf1+t. The third parameter is the
dimensional variable k, which equals 6 in this case.
The return value of the function asm_wsum is an integer, but S
t
is a floating point
value. Hence, the return value is multiplied by a floating point number 0. 00000001
so that the result stored into s[t] could be converted to floating point type. The
multiplication by 0.00000001 ensured the correct magnitude of value to be stored in
s[t], which is magnified by 10000*10000=100000000 times due to the conversion of
wf and xf elements into integer type.
Timing of the program starts before the conversion procedure. The variable t i mer is
used to capture the start time. Timing stops after the computation of s[t]. The
execution time of the program is calculated by the difference of current value of
msTi cks and t i mer . The result is then stored into t i mer 1.
Execution time and the computed S
t
values are displayed subsequently using the
printf function.
The second method is implemented as follows:
t i mer =msTi cks;
for( t =0; t <28; t ++)
{
s[ t ] =0;
for( k=0; k<6; k++)
{
s[ t ] =s[ t ] +wf [ k] *xf [ t +k] ;
}
}
t i mer 2=msTi cks- t i mer ;
printf( " t i me used f or par t 2 pur e C ver si on i s %d us\ n" , t i mer 2) ;
for( t =0; t <28; t ++)
{
printf( " s[ %d] i s %l f \ n" , t , s[ t ] ) ;
}
In pure C program statements, computation involving floating point numbers can be
implemented directly without type casting. Hence, for each t=6 to 33, S
t
is the
accumulated sum of wf [ k] *xf [ t +k] , where k=0 to 5. The WMA is computed using
two nested loops. The inner loop calculates the WMA for each S
t
. It is important to
initialise s[t] to 0 before entering the inner loop. The outer loop runs 28 times to
compute S
t
from t=6 to 33.
Timing of the program starts before entering into the outer loop. The variable t i mer
is used to capture the start time. Timing stops after completion of the outer loop. The
execution time of the program is calculated by the difference of current value of
msTi cks and t i mer . The result is stored into t i mer 2.
Execution time and the computed S
t
values are displayed using printf function
subsequently.
Problems and Solutions
Programme Optimization
It is always a good programming practice to use PUSH to save the initial register
contents onto a stack and recover them later using POP whenever registers have to
be used by another function. In part 2 of the project, the assembly programme used
1149us to compute the WMA compared to 248us when PUSH and POP instructions
were used. The reason could be that the assembly function spends extra time to
process the contents that were stored in registers R3-R5.
Overflow
When casting an integer to a floating point type, it is important to consider the ranges
of both types. For example, the statement s[ t ] =asm_wsum( wf 1, xf 1+t , 6) will trigger
the Hard Fault Handler because overflow happens during value assignment. The
return value of the function is an integer of the magnitude1u
9
, however, this
overshoots the range of the exponent field of a floating point number, c
8
. Therefore,
the return value needs to be contracted by 100000000 times before it can be
assigned to the floating point type variable s[t].
Conclusion
The project consolidated our knowledge on the basic aspects of assembly language
as well as ARMv7-M Instruction Set. The results of the project illustrated clearly that
a well-written assembly language function has faster execution than its C equivalent.












Appendix:
FLOWCHART

Z=0?
START
PASSPARAMETERSFROMCTO
ASSEMBLYFUNTION
LOADW
N
FROM[R0],INCREMENTR0BY4
LOADX
N
FROM[R1],INCREMENTR1BY4
MULTIPLYW
N
ANDX
N
,ADDTOR4
SUBTRACTR2BY1UPDATEFLAGS
PUSHR3R5TOSTACK
CLEARR4TO0
RETURNTOC
POPFROMSTACKTOR3R5
NO
YES

Das könnte Ihnen auch gefallen