Sie sind auf Seite 1von 110

CHAPTER 1 INTRODUCTION

1.1

Motivation

Applying engineering knowledge into biomedical application has greatly improved the approaches for the prevention, diagnosis, and treatment of disease for patient rehabilitation, and for improving health. For an example, there have been increasing efforts to achieve wearable vital sign monitor which enable remote monitoring of patient condition due to the great potential it promises to the medical field [20-23]. Wearable vital sign monitor can alert healthcare providers when patients vital sign becomes critical and requires medical attention especially in time-critical situation such as during the onset of a heart attack. As an example, the CodeBlue project at Harvard University has developed a system that incorporates wireless pulse oximetry sensors and electrocardiogram sensors to continuously monitor and record vital sign and cardiac information from patients. These existing solutions have been relying on discrete and commercial components to construct such wearable device. The advancement of VLSI technologies has created the possibilities of integrating components into single Application Specific Integrated Circuit (ASIC) to approach system on chip (SOC) solutions which bring advantages in lowering overall system cost, size, power consumption while improving system performance, and lowering EMI/EMC issue by reducing the number of components on a printed circuit board (PCB). As electronics devices becomes smaller and price becomes more competitive, ASIC will become one of the few options left to lower system cost and reduce size further.

-1-

Our intention is to implement a portable ECG device while integrating as much of the components needed into a single chip as possible.

1.2

What is an Electrocardiogram

An ECG is a test that measures the small electrical activity of the heart during heart activities and is used in the investigation of heart disease. It can be used to obtain information such as the heart rate, regularity of the heartbeat, size, and position of the heart chambers as well as any damage to the heart. A normal heart beat is initiated by a small pulse of electric current that spreads from the pacemaker cells. This electrical wave spreads in a manner causing the heart muscle contract in an optimal way to pump blood through the body. These bioelectrical signals are typically only a few milivolts in amplitude, therefore, an amplifier is required to accurately record, display and analyzer the ECG. The ECG signals are also often contaminated by strong disturbance from the electrode contact, breathing artifacts (< 0.5Hz), and potentially 50Hz from the power mains [24]. Therefore, the amplifier should also be designed to serve as a filter for the removal of these unwanted signal artifacts.

-2-

1.3

Overview of a Portable ECG

3-leaded ECG from patient

Low Noise Amplifier

Analog to Digital Converter

Digital Signal Processor

Remote Host such as PDA, Smart phone or Computer

wireless link

RF Transmitter /Transceive r

Storage Element

Figure 1.1 Block diagram of a portable ECG system The block diagram for a portable ECG system that we have envisaged is shown in Figure 1.1. The Low Noise Amplifier functions are to amplify the heart electrical activity to a level that can be reliably registered by an Analog to Digital Converter (ADC) while removing unwanted noise signals. As the name implies, the ADC converts the analog ECG data into digital signals so that we could use digital logics to do the necessary processing. It is desired that the ADC has resolutions of at least 10 bits to represent the actual ECG wave accurately. The digital ECG data will then be processed by a Digital Signal Processor (DSP). The DSP is required for digital filtering of ECG signals [24], encode the ECG data to be stored into a storage element as well as to be transmitted wirelessly via a wireless transmitter/transceiver to a remote desktop PC, smart phone or PDA. The DSP can be implemented by means of dedicated block in an IC or by use of firmware in a microcontroller [28]. As the amount of raw ECG data recorded in an extended period of time is enormous, the DSP can also be used to perform data compression to reduce the size of the ECG data if there is extra bandwidth available to handle it. This is useful to reduce the -3-

capacity of the storage element required, and to use the wireless links bandwidth more efficiently. There are numerous compression algorithms available that are specially optimized to reduce the size of ECG data [25-27]. The continuous and periodic nature of ECG waveform can easily be exploited to do such compression. The continuous nature ensures that the difference in amplitude between successive samples of an ECG signal is smaller than the amplitude of the sample itself. This property is used in delta encoding of ECG data. While the periodic nature of the ECG data means that the ECG waveform is repetitive, thus compression can be done using the variation of a sample from a mean value from previously sampled samples. This is done for wavelet-based compression. In the event that the portable ECG senses abnormality in the patients heart activity, the remote host can automatically connect to a healthcare provider and provide useful information (such as patients condition and current location) so that proper action can be taken. The patients location can possibly be obtained via a GPS system in a smart phone or the home address if the information is sent from a PC. This can potentially save a patients life in a time-critical event when the patient requires immediate care.

-4-

1.4

Design Consideration

We aim to build a small portable ECG system capable of sampling 12 bit resolution ECG data at sampling rate no less than 200Hz with the capability of storing ECG data in a flash memory card, and a wireless link to a remote host to allow convenient ECG data management while keeping the power consumption low. Having the above target in mind, the following approaches were made: 1. LNA, ADC and an 8051 compatible microcontroller [2,15,31] is integrated into a single mixed signal IC to reduce the number of components required thus reducing PCB size. As we will see in Chapter 5, the custom made 8051 microcontroller have ample processing power to handle sampling of ECG data at rate of 200Hz and can be easily interfaced to a SD card. The built-in UART can also be used to interface a Bluetooth module. To further reduce the number of components required on board, the 8051 is also designed with the ability to boot from the SD card, removing the requirement of having another nonvolatile memory on board. Having a general purpose microcontroller as the main core of our design also allows the flexibility of implementing different functions in software/firmware as compared to having to design specific hardware for different functions [30]. 2. SD Card is chosen as the storage element because of its small form factor comparing to other commercially available removable flash memory card as well as its large storage capacity. To store 24 hours of uncompressed 3-lead 12bit resolution ECG data at 200Hz, it would require approximately 100 MB (assuming 2 bytes to store 12 bits of data to simplify processing). 3. Bluetooth wireless module is used for the wireless transceiver. Bluetooth is a well known low-powered wireless solution that is widely used in portable

-5-

devices. The portable ECG can conveniently setup a Bluetooth link with a mobile phone and automatically dial to a medical service provider when needed.

1.5

Objective

Due to the multitude of the project and the limited time constraint to complete this project, the project to build the portable ECG devise is divided into several smaller components and is assigned to several individuals to be completed individually. This report describes the effort to prepare the digital base band portion of the building blocks required for the portable ECG. This includes the design and implementation of an improved version of Intel 8051 microcontroller into 0.35m CMOS process by Austriamicrosystems (AMS), testing the 8051 chip, as well as the development of firmware code and PCB required to interface the 8051 to a FAT16 formatted SD card. The analog portion of the mixed signal chip (a low noise amplifier together with 12 bit analog to digital converter) is implemented by another research scholar and is combined with the 8051 design into a single chip. As the analog portion of the ECG chip required for ECG data acquisition is not ready during the candidature period, a working ECG device is not built. However the firmware for interfacing a SD card developed in this project can be easily modified to do the function of ECG device with minor changes. The final implementation of developing a working ECG devices PCB and firmware code will be left to be done by other candidates and will be based on the building blocks described in this report.

-6-

1.6

Organization of Report

This report is divided into the following chapters: Chapter 1. Introduction This chapter starts by giving the motivation of the project and wrap up with setting the objectives for this portion of the project to implement a portable ECG. Chapter 2. Design of s8051 Microcontroller Defines the functional specifications required by the s8051 microcontroller which will be the main controller of the ECG device. Chapter 3. Implementation of s8051 Microcontroller Go through the digital IC design flow to implement the s8051 microcontroller.

Chapter 4. Building Development Tools and Testing of s8051 Chip Describes the effort done to build the necessary development tools, to turnon and test the fabricated s8051 chip. Chapter 5. Interfacing SD Card Boot Loader and Bluetooth Serial Port Profile Describes the firmware implementation of a SD card boot loader, and how a Casira Bluetooth module can be used to interface the s8051. Also address the processing power requirement of the s8051 Chapter 6. Conclusion Concludes this report and presents future improvements.

-7-

CHAPTER 2 DESIGN OF S8051 MICROCONTROLLER

The s8051 has been designed taking into consideration of the features requirement of the portable ECG device and processing power to implement the functions required. The advantages of having our own 8051 block compared to using off the shelve 8051 chip are countless. In our project, this is motivated by the needs to build our own custom ECG ASIC where we have integrated a high performance ADC and LNA that is done by another research scholar separately. As the ECG project progress, we will also be able to integrate more blocks into a single chip to perform specific functions required by the ECG. These functions may include an SDIO controller, USB controller, and a flash memory controller and so on. This chapter outlines some of the main features of s8051 which differ from the original Intel 8051 (i8051) as well as changes and improvements made to the s8051 design compared to the original open source RTL downloaded from UCR [15] which we will address as c8051 in this report.

-8-

2.1

Architecture Overview of s8051 Design

The 8051 is a general purpose 8 bit MCU architecture suitable for use in most embedded system application which requires moderate processing capabilities making it suitable for use in implementing an ECG device.

ALU

DECODER

SERIAL PORT

TIMER/COUNTER

s8051

CPU

128 BYTES RAM, INTERRUPT CONTROLLER & 4 I/O PORTS

CLK RST

WR

RD ADDR DATA

P0

P2

P1

P3

EXTERNAL MEMORY INTERFACE

Figure 2.1 Block diagram of the s8051 core Table 2.1 Features Comparison of c8051, s8051 and i8051 s8051 c8051 8-bit CPU Extensive Boolean processing capabilities 64K Program Memory address space 64K Data Memory address space 128 bytes of on-chip Data RAM 32 bidirectional and individually addressable I/O lines Two 16-bit timer/counters x Full duplex UART x interrupt structure x Oscillator periods per machine cycle 8 16 MIPS at 50MHz 6.25 3.13

i8051 12 4.17

The s8051 MCU architecture is summarized in Figure 2.1 and is compared with i8051 as well as c8051 in Table 2.1.

-9-

As we can see in Table 2.1, the s8051 implemented in this project is superior to c8051 and i8051. While the Intel 8051 was designed to operate at 24MHz, to do a direct architectural comparison, all three 8051 are assumed to be operating at 50MHz. The original open source c8051s RTL lacks many features that are found in standard Intel 8051 (UART, timer and interrupt system). These features are important for our project and are therefore implemented and incorporated into our s8051 design according to the specifications documented in Intel 8051 datasheet to ensure compatibility with standard 8051 compilers. The UART is needed to communicate with an external Bluetooth module. The interrupt structure ensures that the system can response to events promptly and is a key feature for real time application. The timer block generates the baud rate required by the UART as well as any other timing requirement. To reinforce the processing capability of s8051, the internal state machine is modified to improve performance.

2.2

Performance Comparison between i8051, c8051 and s8051

The main difference between the s8051 design implemented in this project with i8051 and c8051 is that the s8051 design is optimized with a 2 stages pipeline and branch prediction which resulted in the reduction in the average clock per instructions (CPI) needed. In addition, most of the instructions are executed in one machine cycle of 8 clocks cycles for s8051 except for 24 instructions which we will discuss later in this chapter. As we can see in the Intel 8051 instruction set [2], the Intel 8051 has 64 x 12 clock cycles instructions, 45 x 24 clock cycles instructions, and 2 x 48 clock cycles instruction. Assuming that a program has equal distribution of each instructions, the

- 10 -

Intel

8051

would

have

an

average

clock

per

instructions

of

64 12 + 45 24 + 2 48

64 + 45 + 2

= 17.51CPI .

Similarly, the average CPI for s8051 assuming evenly distributed instructions and no branch prediction is 87 8 + 24 16 of 16. This implies that the s8051 is roughly 1.8 times faster than a standard Intel 8051 and 1.64 times faster than the c8051 given the same operating frequency. However this is only a rough estimate as it is unlikely that as the instructions distribution in a program to be perfectly even. Furthermore, the branch prediction algorithm implemented is likely to further reduce the average CPI. As we will see in the simulation results found towards the end of this chapter, where the 8051 at different development stages is executing actual software, the performance of s8051 is almost doubled that of c8051. 111 = 9.73CPI . The original c8051 have a fix CPI

- 11 -

2.3

8051 State Machine

The 8051 state machine is responsible for the sequence of how each instruction is executed. It is therefore the key factor to the MCUs performance.

Figure 2.2 State sequence in i8051 The machine cycle for i8051 consists of a sequence of 6 states, numbered S1 through S6. Each state is divided into a Phase 1 half and Phase 2 half, thus each machine cycle is 12 oscillator period long.

- 12 -

DECODE STAGES

EXECUTION STAGES

S1 CLK

S2

S3

S4

S5 S6

S7

S8

S9 S10 S11 S12 S13 S14 S15 S16 S1

READ OPCODE READ 2ND BYTE READ 3RD BYTE EXECUTION OF INSTRUCTIONS

S1

S2

S3

S4

S5 S6

S7

S8

S9 S10 S11 S12 S13 S14 S15 S16

Figure 2.3 State Sequence in c8051 As shown in Figure 2.3, the c8051 executes an instruction in 16 clock cycles. The state machine is not optimized efficiently for performance, allowing room for improvement.

S1 CLK

S2

S3

S4

S5 S6

S7

S8 S1

S2

S3

S4

S5 S6

S7

S8

READ NTH OPCODE DECODING PIPELINE S1 S2 S3 S4 S5 S6 S7

READ NEXT BYTE

READ (N+1)TH OPCODE

READ NEXT BYTE

S8 S1

S2

S3

S4

S5 S6

S7

S8

EXECUTION OF (N-1)TH OPCODE EXECUTION PIPELINE S1 S2 S3 S4 S5 S6 S7 S8 S1

EXECUTION OF NTH OPCODE

S2

S3

S4

S5 S6

S7

S8

Figure 2.4 State Sequence in s8051

- 13 -

Figure 2.4 shows that the state sequence in s8051 consist of 8 clock cycles per machine cycles which is a two fold improvement over that of c8051. This is achieved by dividing the c8051 machine cycles to 2 stages of decode and execution stage and executing both stages in parallel. We can therefore execute an instruction while decoding another instruction in advance, simultaneously. The following sections describe the considerations needed when attempting our approach to re-architect the 8051.

2.4

Parallelism with 2 Stages Pipeline

In order to improve the performance of s8051 without having to increase the clock frequency, the decoding and execution of instructions performed by s8051 is divided into a two-stage pipeline to exploit parallelism [29,32]. Care has to be taken while implementing the pipelining in order to exploit parallelism. The ability to execute two pipeline stages concurrently requires each pipeline stage to be independent of each other. The following are possible situations where the decode stage is not independent with the execute stage: 1. Memory I/O dependency where the decode stage fetches instructions from the external memory while the execute stage is accessing the external memory at the same time. 2. Flow dependency where the decode stage increments the program counter to fetch the next instruction while the execute stage is executing a branch instruction which updates the program counter to the branched location.

- 14 -

2.4.1

Memory I/O dependency

There are 16 instructions which cause the execute stage to access the external memory. They are, 2 program memory read instructions (MOVC A, @A + DPTR; MOVC A, @A + PC), 2 external RAM read instructions (MOVX A, @Ri; MOVX A, @DPTR), 2 external RAM write instructions (MOVX @Ri, A; MOVX @DPTR, A) and 10 other instructions which requires 3 bytes to execute (ANL direct, #data; CJNE A, direct, rel; CJNE A, #data, rel; CJNE Rn, #data, rel; CJNE @Ri, #data, rel; DJNZ direct, rel; JB bit, rel; JBC bit, rel; JNB bit, rel; LCALL addr16; LJMP addr16; MOV direct, direct; MOV direct, #data; MOV DPTR, #data 16; ORL direct, #data; XRL direct, #data) [2]. As we will see later in section 2.2 Memory Access Sequence of this thesis, these 10 instructions which require 3 bytes to execute will be executed in two machine cycles because the s8051 could only fetch 2 bytes of op code in one machine cycle. The 3rd byte will be fetched by the execute stage during the 2nd machine cycle. To avoid clashes between the two pipeline stages while executing the above mentioned 16 instructions, the execute stage would signal the decoding stage to stop fetching from the memory by asserting skip_dec = 1 at the 1st clock cycle of the machine cycle and asserting skip_dec = 0 at the last clock cycle of the machine cycle to allow the decode stage to fetch instructions during the next machine cycle.

2.4.2

Program Flow dependency

During the execution of a normal instructions (non branch instructions), the program counter is incremented in sequence. Therefore the microcontroller can save time by decoding one instruction while the preceding instruction is executed in parallel.

- 15 -

However, during a branch instruction, the execute stage updates the program counter to the branch location. Therefore, the decode stage will only be able to decode the next correct instruction after the execute stage finishes execution of the branch instruction. This presents a flow dependency of the decode stage on the execute stage. Likewise, the execute stage will only be able to execute the next instruction after the decode stage fetches the correct instruction from the right program counter. This again presents another flow dependency of the execute stage on the decode stage. To satisfy these flow dependencies, the execute stage would assert skip_inst = 1 to signal the decode stage to dump the instruction wrongly decoded during a branch instruction and replace it with the NOP instruction so that the execute stage will not executes the wrongly decoded instruction. Therefore, it is expected that there will be a latency delay of one machine cycle every time a branch instruction is executed. There are 15 program branching instructions in the 8051 instruction set. 7 branch instructions require 3 bytes of op codes to execute. For these 7 instructions, one machine cycle delay will be imposed to avoid memory I/O dependency of the two pipeline stages as discussed in the previous section. Therefore, only 15-7 = 8 branch instructions would be affected by the program flow dependency.

2.4.3

Branch Prediction

Having latency delay every time a branch instruction is encountered can slow down the microcontroller significantly, especially if the program being executed has a lot of jump/loop instructions. A simple branch prediction algorithm is implemented to solve this problem where 162 bits (~20 bytes) of cache memory are allocated to store records of the three most recent

- 16 -

branch instruction made. The cache memory stores states of internal registers required to make a jump to the branched location. Each of the 3 branch record includes 16 bits for the branched address, 16 bits for the current program counter, 14 bits for the 2 bytes of op codes for the instruction at the branched location (where 2 most significant bits of the 1st op code is omitted), 1 bit to store the status of whether the instruction at the branched location would require to fetch a 3rd op code, and 7 bits to store the decoded op code for the instruction at the branched location. If the current instruction being executed by the s8051 which is a branch instruction that has been previously recorded in the cache memory, the recorded instruction will be loaded directly into the execution stages pipeline without going through the decode stage. The decoding stage will simply discard its contents since it is decoded from an outdated program memory address. If no record of the branch instruction has been previously stored in the cache memory, the decoding stage will load the NOP instruction (do nothing) into the execution stage. One machine cycle will lapse while the decode stage decodes the next instruction after the branch, and the execute stage waits for the next instruction. The normal fetch and execute sequence will continue until the next branch is encountered. Consider the following sequence of codes which decrement accumulator 3 times in a loop and moves its value into the external memory: ; move the value 3 into accumulator mov a, #3 label1: ; decrement accumulator dec a ; loop to label1 if accumulator is not 0 jnz label1 ; move the value in accumulator into ; external memory pointed by the data pointer movx @dptr, a

- 17 -

decode stage
decode mov a, #3

previous instruction

execute stage
execute previous instruction mov a, #3

decode dec a dec a decode jnz label1 jnz label1 wrongly decode movx @dptr, a decode dec a and store in cache memory decode jnz label1 load dec a from cache memory decode jnz label1 jnz label1 signal skip_inst = 1 NOP

a=3

a=a-1

jump to label1

do nothing

Branch instruction encountered, causes latency delay while decode stage stores branch history

dec a a=a-1

jump to label1 dec a a=a-1 jnz label1

Branch instruction encountered, no latency delay as branch history is available in cache memory

time

decode movx @dptr, a movx @dptr, a

accumulator is 0, proceed execution of next instruction

skip decoding to allow execute stage to access external memory decode next instruction

@dptr = a signal write value of a to skip_dec = 1 external memory NOP execute nothing next instruction

execute stage needs to access the memory IO, therefore decode stage halt for one machine cycle

Figure 2.5 Flow Chart of s8051 2 stages pipeline

- 18 -

Figure 2.5 summarizes how the decoding stage and executing stages are organized to handle the memory IO dependency, branch flow dependency as well as the branch prediction algorithm for the short sequence of 8051 assembly codes.

2.5

Memory Access Sequence

The c8051 design was initially designed to execute instructions from a non standard internal memory coded in a non efficient manner for practical implementation due to the sequence how the memory is accessed. As the cost per storage capacity to integrate one time programmable ROM into the same silicon is high, the s8051 is designed to execute program from an external memory chip. Therefore, it is important that the proper memory access sequence is performed according to memory chips specifications to ensure proper timing. Execution from internal ROM Clk Addr. Bus Data Bus
data sampled addr 1 addr 2 addr 3 addr 1 addr 2

Execution from external ROM

data sampled

Rd

1 clock cycle 1 machine cycle = 8 clock cycles

3 clock cycles 1 machine cycle = 8 clock cycles

Figure 2.6 Timing waveform of c8051 and s8051 program fetch sequence As shown in Figure 2.2, when the c8051 is executing from its internal ROM, only 1 clock cycle of access time were allowed for the program memory to react to any

- 19 -

changes to the address bus to provide a valid data to the 8051. This is only possible if the 8051 executes directly from an internal ROM that is part of the 8051 core. However, if an external memory (such as a flash memory) is used, longer access time may be needed. A typical flash memory has access time of around 70ns [4]. This means that if the memory access sequence is not modified, the maximum clock speed that the s8051 is able to operate before violating the flash memorys access time is
1 14 MHz . 70ns

Since the s8051 is expected to execute from an external memory chip, the whole memory access sequence is redesigned to allow 3 clock cycles of access time, as shown in Figure 2.2. This allows the use of operating frequency of up to 3 42.9 MHz and yet not causes access time violation provided that the final 70ns s8051 IC is capable of operating at this frequency. As a consequence of allowing 3 clock cycles of memory access time, the new s8051 design could only fetch two op codes from the memory in one machine cycle. This has caused the 16 instructions in the 8051 instructions set that are 3 bytes in length to be executed in 2 machine cycles instead of one. This naturally increases the average clock per instruction of the s8051. However, as we will see in simulation waveforms at the last section of this chapter, the increase in CPI due to new memory access sequence is not significant.

- 20 -

2.6

Port Structure and Addition of Tri-state Special Function Registers

The port structure for the s8051 microcontroller differs from that of the Intel 8051. This is because the digital periphery cells library of AMS 0.35m CMOS process does not have the required bidirectional buffers to imitate that of Intel 8051. Furthermore, s8051 has dedicated address and data bus as well as read and write control pin to interface with an external memory chip, therefore these lines are not multiplexed with the 8051 I/O ports.

2.6.1

Port Structure of Original Intel 8051

Figure 2.7 Intel 8051 Port Bit Latches and I/O Buffers
The above figure illustrate the Intel 8051 port bit latches and I/O buffers [2]. The output drivers of Ports 0 and 2 are switchable to an internal address and address/data bus. Ports 1, 2, and 3 have internal pull-ups while Port 0 has open drain outputs. Each I/O line can be independently used as an input or an output.

- 21 -

However Ports 0 and 2 may not be used as general purpose I/O when it is being used as the address/data bus. This would render the Port 0 and Port 2 useless as general purpose I/O if the s8051 design were to follow the Intel 8051 by implementing the address/data on these two ports since there is no built-in programmable ROM in the s8051 core. To be used as an input, the port bit latch must contain a 1, which turns off the output driver FET. Then, for Ports 1, 2, and 3, the pin is pulled high by the internal pull-up, but can be pulled low by an external source. When pulled low by external source, Port 1, 2 and 3 will source current. Only Port 0 is considered true bidirectional, because when configured as an input, it floats.

2.6.2

Port Structure of s8051

The port structure of s8051 is designed according to the available periphery cells library [3] while maintaining as much compatibility with the Intel 8051 as possible. The clock input buffer is implemented using ICCK2P pad-limited CMOS Clock Input Buffer with 2mA drive strength, reset input with ICUP pad-limited CMOS Input Buffer with Pull Up. The address bus, read and write control pins are implemented using the BU2P padlimited output buffer with 2mA drive strength. The data bus, Port 0, 2, and 3 are implemented using the BBC4P pad-limited CMOS bidirectional buffer with 4mA drive strength. Port 1 is implemented using BBCU4P pad-limited CMOS bidirectional buffer with 4mA drive strength and pull-up. The bidirectional buffer with pull-up is chosen for Port 1 so that there will be at least one port that can be used as input port without the

- 22 -

needs of having additional external pull-up resistors on the PCB board in case inputs that need to be pulled high (eg. push buttons or input of SPI bus) are needed. Since the internal pull-up resistors on the I/O buffer supplied by the periphery cells library are not meant for driving external loads, true bidirectional buffers are implemented for I/O ports of s8051 similar to that of Port 0 found in Intel 8051. Four additional bit addressable special function registers (TRIS0, 1, 2 and 3) are added to the s8051 design which determines whether the 4 bidirectional I/O ports are set to tri-state for input or set as output ports. The 4 additional registers, TRIS0, TRIS1, TRIS2 and TRIS3 are mapped to internal memory location 0xC8, 0xD8, 0xE8 and 0xF8. These locations are not occupied by any of the Intel 8051s original special function registers and are located beyond the 128 bytes of internal RAM of the s8051. Figure 2.8 shows the I/O buffers used in our design from the periphery cells library. The s8051 port bit latches and I/O buffers are shown in Figure 2.9.

- 23 -

Figure 2.8 I/O Buffers Used From the Periphery Cells Library

- 24 -

sfr_p1 sfr_tris1

sfr_p1 internal bus

A Y

BBCU4P
PAD

external pin

Port 1

sfr_p0 sfr_tris0

sfr_p0 internal bus

A Y

BBC4P
PAD

external pin

Port 0, Port 2 and Port 3(bit 2 to bit 7)


alternate function enabled alternate function alternate function disabled sft_p3 internal bus

sfr_p3 sfr_tris 3
A Y

BBC4P
PAD

external pin

Port 3(bit 0 and bit 1 with alternate functions) Figure 2.9 s8051 Port Bit Latches and I/O Buffers
The 4 tri-state special function registers (TRIS0, TRIS1, TRIS2, and TRIS3) are initialized to 255 to allow better compatibility for programs written for the Intel 8051. For these programs executed on the s8051, the I/O ports will be tri-stated (instead of being pulled up by the internal pull-up resistor on Intel 8051) and act as input when logic 1 is written to the I/O pins. Under normal situations, the 4 internal memory locations would not be touched by such programs as they are not occupied by any special function registers or memory bytes in an Intel 8051.

- 25 -

2.7

SD Card Boot Loader

The final s8051 design sent for fabrication includes an additional feature of being able to load its program from a SD Card. While the SD card is intended to be used as a storage for large amount of ECG data, it can also act as a non-volatile storage to store the 8051s firmware. The motivation of implementing this feature is to allow us to eliminate the needs to interface an additional external ROM to the s8051 and allows the 8051 firmware to be updated easily by simply drag and drop a binary file from windows into the SD card. It is also motivated by the needs to convince our self that the fabricated 8051 is able to interface a SD memory card at an acceptable speed. To achieve this, the 8051 will start execution from an internal ROM which stores a fixed pre-designed program that would load binary data from a file in the SD card (eg. program.bin) into an external SRAM chip and resets the internal registers and continue execution from the external SRAM.

- 26 -

x_addr

x_data

x_wr irom_addr irom_data

s8051
clk rst sd clk int_rst internal reset signal addr data rd wr

reset logic

CPU

memory interface

xram_e

irom irom_rd

I/O ports

pcon(6)

internal RAM

other logic blocks

internal ROM

Figure 2.10 s8051 (with SD Card Loader) Block Diagram

Two additional logic blocks are added, the reset logic, and memory interface block. The memory interface block will receive memory access information (data, addr, rd, wr) from the CPU, and decides which of the internal ROM or the external memory to execute code from depending on the status of irom signal. The reset logic block will reset all internal logic blocks based on the status of the external rst and sd pin, and whether the 6th bit of the 8051 PCON special function register is being set. If sd = 0, indicating that program will not be loaded from SD card, the int_rst will follow the external rst, otherwise, int_rst will be asserted 0 for 14 machine cycles whenever 1 is written to the 6th bit of PCON while executing from the internal ROM. The reset logic also asserts irom = 1 when sd = 1 until logic 1 is written to pcon(6). This is to ensure that all the internal registers are being reset before switching execution from the internal ROM to the external memory.

- 27 -

x_rd

state 0
int_rst = '1'; execution from internal rom pcon(6) = 0 irom = '1' pcon(6) = 1 int_rst = '0';

state 1 to state 14
reset pulse of 14 clock cycles to resets internal registers

irom = '0'

int_rst = '0'; irom = '0'

state 15
execution successfully switched to external memory

int_rst = '1'; irom = '0'

Figure 2.11 Reset Logic State Machine when sd = 1


Compatibility with Intel 8051 is maintained since the 6th bit of PCON special function register is a reserve bit for Intel 8051. When execution from external memory, pcon(6) does not have any additional function and writing to pcon(6) will have no effect. Details of how SD Card is interfaced to s8051 will be studied in Chapter 8.

- 28 -

2.8

Simulation Results and Benchmark Showing the Performance of s8051 at Different Development Stages

As described in the previous sections of this chapter, the progress of improving the c8051s RTL can be divided into several stages before arriving to the final stage of s8051. These stages are:

Stage1 : Original c8051 RTL from UCR Dalton Project Stage2 : State machine pipelined/divided into decode and execute stage. Stage3 : Branch prediction algorithm implemented Stage4 : Refined memory access sequence to allow interfacing with external
memory chip.

Stage5 : Changes to the RTL to use CMOS process specific library cells (srams
macro, I/O pads etc.) Simulations were performed during each stage (except Stage5 as this does not involve functional or architectural changes) of functional changes made to the c8051 design to show any impact to the performance as well as to verify our design. It was unfortunate that an actual Intel 8051 architecture simulator is not available to be included in this benchmark for comparison. However, one can easily be convinced that the s8051 is at least 1.5 times faster than i8051 and it is only a matter of how much better s8051 is compared to i8051. This is because s8051s machine cycle is only eight clocks long while that of i8051 is twelve clocks cycle as well as i8051 has more multiple machine cycles instructions as compared to s8051. The simulation waveforms benchmarking our s8051 at different stages are shown in the following pages.

- 29 -

Figure 2.12 Simulation Waveform of 8051 (Stage1) executing testall.c

Figure 2.13 Simulation Waveform of 8051 (Stage2) executing testall.c

- 30 -

Figure 2.14 Simulation Waveform of 8051 (Stage3) executing testall.c

Figure 2.15 Simulation Waveform of s8051 (stage4) executing testall.c

- 31 -

Figure 2.12 to Figure 2.15 shows the functional simulation waveforms (from ModelSim) of different development stages of 8051 clocked at 50MHz executing testall.c which tests the ability of an 8051 compatible processor to execute most of the 8051 instructions (except ACALL, LCALL, RET, RETI, and MOVX). If any instruction fails, the program will output the instruction number to P1 and stop. Otherwise, 127 will be output to P1 when all instructions have been executed correctly. The testall.c program is a program that we used to test bench our 8051 design. By inspecting the simulated execution time of the different development stages of 8051, we are able to deduce the performance impact due to the changes we made. As shown in the figures, the execution time at stage1, stage2, stage 3, and stage4 are 16.8ms, 10.5ms, 8.44ms and 8.47ms respectively. What this mean is that the pipeline design alone improves the performance by a factor of 16.8/10.5 = 1.6, and the branch prediction further doubles the performance of the c8051.

Figure 2.16 Simulation Waveform of 8051 (Stage2) executing loop.c

- 32 -

Figure 2.17 Simulation Waveform of 8051 (Stage3) executing loop.c

To further illustrate the performance enhancement by implementing the branch prediction algorithm, another program loop.c is executed by 8051 at Stage2 and Stage3 where the 8051 will increment a counter in an indefinite while loop. The branch prediction effectively reduces the number of machine cycles required for branch instructions from two, to only one. It is also seen that the performance impact from Stage3 to Stage4 due to the changes to the memory access sequence is insignificant. The execution time affected by the new memory access sequence is only a mere (8467670 8442590)ns 100 0.3% . 8442590ns

- 33 -

2.9

Chapter Summary

An Intel 8051 compatible microcontroller is designed as a block in an on going effort to build a custom ECG chip. The s8051 design in this project is implemented based on an open source RTL from UCR Dalton, c8051. C8051s VHDL code is not sufficiently optimized for performance or for practical implementation to allow interfacing with standard memory chips. It also lacks other features found in a standard Intel 8051 such as UART, timer and interrupt structure. All these shortcomings has been addressed in this project by means of re-architecture the 8051 core for performance enhancements, as well as implementing the standard Intel 8051s built in feature in addition to the ability to boot from a SD card. While we try to retain maximum compatibility with i8051, s8051 has several deviations from i8051 due to limitation of CMOS library cells from silicon vendor and other features enhancements specifics to our project. The implementation of 2-stage pipeline design together with a simple branch prediction algorithm are able to improve the performance of s8051 by a factor of two compared to the original c8051 we obtained.

- 34 -

CHAPTER 3 IMPLEMENTATION OF S8051

The implementation of an s8051 on AMS 0.35m CMOS process follows the standard digital IC design flow.

3.1

Digital IC Design Flow

Functional Specifications

Design Entry HDL Coding Simulations Synthesis

Static Timing Analysis

Place and Route

Static Timing Analysis

Figure 3.1 Digital IC Design Flow


The design flow adopted in this project is shown in Figure 3.1. The first step of the design flow is to define all the functional specifications and coding the specifications in to RTL in a HDL language (VHDL for our case).

- 35 -

To verify that the functional specifications being coded in the HDL is correct, functional simulations are performed using the ModelSim VHDL simulator where test benches are created to provide the stimulus to the RTL design. By examining the simulation waveforms, any functional mismatches are detected and the design entry step is repeated until all functionalities are correct. The Synopsys Design Compiler is used to synthesize the RTL design into gate-level net list which will be saved in Verilog and Synopsyss .db file format. Timing and functional verifications are again performed on the synthesized net list. As the Universitys facility does not includes adequate CAD tools with VHDL support, Verilog HDL is used from post-synthesis stage onwards. Here the Cadence NC Verilog will be used instead of the ModelSim simulator as the net list is saved in Verilog format, therefore new test benches written in Verilog language are needed. The synthesized net list will then be input to the Cadence Silicon Ensemble for automatic place and route to generate the final layout to be sent for fabrication

3.2

Synthesis

After verifying the design through functional simulation, synthesis is performed. The synthesis stage translates the VHDL design and maps it into logic cells provided in the technology library to generate an optimized gate-level netlist file. The use of synthesis tools has enabled complex designs to be completed in shorter period of time and result in significant reduction in design time and cost. The netlist file describes the actual circuit to be implemented at low level. The Synopsys Design Compiler is the synthesis tool used to synthesize the design into AMS 0.35m CMOS technology netlist.

- 36 -

There are two ways to evoke the Design Compiler, using Design Analyzer (Design Compilers graphical user interface) or using dc_shell. The graphical user interface is used to simplify the synthesis process.

3.2.1

Setting Up Design Compiler

Before Design Analyzer is evoked, the Design Compiler has to be setup to use the 0.35m CMOS technology from AMS. In addition to using the process library, the sram128x8.db Synopsys database file which contains the timing and functional information of the 128 bytes SRAM IP by AMS is loaded into Design Compiler in order to make use of the optimized IP. This is done by including the .synopsys_dc.setup file (Appendix A) in the Unix home directory.

3.2.2

Setting Design Constraints and Attributes

Constraints and attributes that model the design environment are needed for the synthesis tools as a guideline while performing the synthesis process to determine the design tradeoffs and optimization technique that Design Compiler will use. The design constraints and attributes are entered as part of the dc_script.scr file (Appendix B).

3.2.2.1 Design Environment


The design environment such as the wire load and operating conditions are set to models available in the processs core library.

- 37 -

The 10k wire load model is used and the operating conditions is set to the worst condition during synthesis so that the s8051 is able to operate correctly even under undesirable conditions.

3.2.2.2 Clock Constraint


The clock constraint is the most important timing constraint in our design as it determines the operating clock frequency of the s8051. It is therefore desirable to have clock period as short as possible. As discussed in section 2.2 of this thesis, the theoretical maximum allowable clock frequency (assuming no input and output delays) when using a 70ns access time flash memory is 42.9MHz, therefore a clock period constraint of 20ns which corresponds to 50MHz > 42.9MHz is chosen. For any design, the clock arriving at different registers of a design are usually skewed. This creates a problem in synchronizing all the internal registers. The clock uncertainty constrain is set to limit the allowable clock skew to 0.3ns.

3.2.2.3 Input Delay


Setting input delays constraint the external delays allowable at an input port. Since the only timing critical input that we may interface to the s8051 is an external memory chip and multiple clock cycles were allowed while accessing the external memory chip, the input delay requirement for the s8051 design can be relaxed unless the s8051 is operated at high clock rate. Therefore, input delays constraints are only set on the memory data bus.

- 38 -

Other components that we may interface through the general purpose I/O ports or built-in serial port will not require strict timing since the fastest sampling rate on these input ports is equal to the s8051s machine cycle which is 8 times the clock period.

Clock

Input

maximum input delay

setup time, tsetup

minimum input delay = hold time, thold

Figure 3.2 Input Delay


Figure 3.2 shows how the input delay constraint affects Design Compiler while performing optimization. The maximum input delay is the difference between the clock period and the setup time required at the input. The minimum delay is the hold time required at the input.

3.2.2.4 Output Delay


As with input delay, the only output ports that are timing critical when operated at high clock speed is the memory interface. Therefore output delay constraints are only set on the memory address and data bus, as well as the read and write control pins.

3.2.2.5 Area Constraint


The area constrain is not set as the final chip occupied by the digital core but is very much determined by the number of I/O pads on the final chip. This is because the

- 39 -

number of I/O pads is large enough such that the area surrounded by the I/O pads is larger than the area required by the digital core. It is therefore desired to reduce the number of I/O on the final s8051.

3.2.3

Saving Synthesized Netlist and Reports

After the synthesis process is completed, the netlist is saved in the Synopsys database format and Verilog format. Some of the tools used after synthesis such as ncverilog simulator and Cadence Silicon Ensemble does not have good support for VHDL format. The Verilog netlist will be imported into Silicon Ensemble later for automated Place and Route. The synthesis report which contains timing path report, area report and power report is generated (Appendix C).

Since the timing report shows negative slack of only 0.99ns, we would expect the s8051 to be able to handle clock period of 20+0.99 = 20.99ns. However, Design Compiler cannot automatically synthesize or verify the behavior of asynchronous logic. Therefore, functional simulation has to be performed on the synthesized netlist to verify its correctness. In our design, the ALU block has been coded as asynchronous logic.

- 40 -

Figure 3.3 Post Synthesis Simulation Waveform s8051 executing testall.c (22ns period)

Figure 3.4 Post Synthesis Simulation Waveform s8051 executing testall.c (23ns period)

As seen in Figure 3.3, although the clock period is set to 22ns (> 20.99ns), the s8051 is unable to complete execution of testall.c and failed at instruction #2 (ADD A,Rn) during simulation. This shows the asynchronous ALU block is the cause of timing violation.

- 41 -

When the clock period is increased to 23ns (Figure 3.4), the s8051 is able to complete execution of all instructions successfully. The s8051 is able to be operated at 1/23ns = 43.47MHz.

3.3

Timing Simulations

To verify the Verilog Netlist generated by Design Compiler and Silicon Ensemble, timing simulations are performed using the Cadence NC Verilog simulator. The NC Verilog simulator is able to include logic and routing delay into the simulation waveform to gives an accurate simulation of the actual circuits.

3.3.1

How Timing Simulations Are Performed

To simulate the Verilog netlist, a top level Verilog module i8051_top which reflects the environment of an actual s8051 setup is needed. This top level model will connect the s8051 to models of other external components such as RAM or ROM similar to usual setup of an embedded microcontroller in real life (eg. RAM and ROM connected to address and data bus). Here the Verilog model of the external ROM plays an important role as it contains the program to be executed by s8051. It is created by converting program executables compiled by an 8051 compiler into a Verilog ROM model. The job of these conversions is made easier by using a C program, makevlog.c. The makevlog.c is modified from an existing program in UCR Dalton website that converts 8051 executables to VHDL ROM model. Finally, a test bench is created where it provides the reset pulse at the beginning of the simulation and generates the clock source and other stimulus for the rest of the simulation.

- 42 -

3.3.2

Post-Layout Simulations

In order to include the wire and logic delays imposed during the Place and Route steps, sdf annotation is required. The s8051.sdf file generated by Silicon Ensemble contains the timing delay information. The following sdf_annotate statement should be added after wire declaration in the routed Verilog Netlist file to include these delay information:
$sdf_annotate ( ./s8051.sdf, s8051, 0.8:1:1.2, FROM_MTM ); end , sdf.log, MAXIMUM,

Figure 3.5 Statement for SDF Annotation

3.3.3

Simulation on s8051s Instruction Set

Figure 3.6 Post Synthesis Simulation Waveform s8051 executing testall.c (23ns period)

testall.c tests the ability of an 8051 compatible processor to execute all instructions except ACALL, LCALL, RET, RETI, and MOVX. If any instruction fails, the program

- 43 -

will output the instruction number to P1 and stop. Otherwise, 127 will be output to P1 when all instructions have passed. As shown in the Simulation waveforms the s8051 is able to execute all instructions correctly at 43.47 MHz.

3.3.4

Simulation on s8051s Serial Port and Timer

Figure 3.7 Post Layout Simulation Waveform of s8051s Serial Port in mode 1

Figure 3.7 shows the post layout simulation waveform of s8051 transmitting AA (Hex) from its UART TX (port3[1]) pin. Here the timer 1 is configured to mode 2 (auto reload) to generate the serial port baud rate. As seen in the simulation waveform, the first bit transmitted is the start bit 0, and transmission ends with a stop bit 1.

- 44 -

3.3.5

Simulation on s8051 Executing loader.c

Figure 4.7 Simulation Waveform of SD Card Reset Command (loader.c) verifies the s8051s ability to handle timer interrupt by generating the serial clock required by the SPI bus.

Figure 3.8 Simulation Waveform of SD Card Reset Command (loader.c)


In the above simulation waveform, the s8051 is sending the SD Card reset command (CMD0) sequence via its SPI bus (emulated by its general purpose I/O). As shown in the figure, the CMD0 sequence are 40 00 00 00 00 95 (hexadecimal) or 01000000 00000000 00000000 00000000 00000000 10010101 (binary).

The above simulations are used to successfully verify all features implemented in the s8051 design before the final layout is sent for fabrication.

- 45 -

3.4

Place and Route

Placement and routing are done automatically using the Cadence Silicon Ensemble [7] once the Verilog netlist synthesized by Design Compiler is verified through functional simulations. Silicon Ensemble (SE) is an area based standard cell placement and routing tool. It performs placement of cells based on a given area and the dimension of a chip. After placement, the router will try to route in the given area without changing the placement of cells. AMS provides a script ams_se that can be called in the Unix shell to setup SE to use AMSs technology file [8]. The ams_se script will perform the following operations: setup proper place and route directory structure prepares se.ini which initializes Silicon Ensemble prepare macro files with commands which guides SE to complete the place and route process prepare gcf files for importing CTLFs into SE prepare a DEF/power_corner.def template file that can be used to insert power pads and corner cells into design prepare CTGen command files for clock tree generation prepare a GDSII Map File (gds2.map)

- 46 -

3.4.1

Silicon Ensemble Design Flow

The Silicon Ensemble completes the whole placement and routing process according to gemma.mac (Appendix D) macro file prepared by ams_se script. The gemma.mac file is modified accordingly to suit our design before being executed by SE to prepare the final layout.

3.4.1.1 Loading the Design Database


As soon as SE is started the design database has to be loaded. First, the Library Exchange Format, LEF files are loaded. The LEF files contain the library information, description of digital standard cells, routing layers and vias. Next, the Timing Library Format (TLF) or Compiled Timing Library Format (CTLF) files are read in to specify timing information. This is also important for back annotation to write a complete RSPF file including pin capacitances. A Global Constraint File (Appendix E c35b43.3V.gcf) that includes references to the CTLF files as well as the operating conditions is used to import the timing data. Here, the worst operating conditions is again used. The above database files are loaded for the core library and periphery cells for AMS CMOS technology, as well as the 128 bytes SRAM IP.
##-- Import Library Data ##-- LEF FINPUT LEF F /app11/AMS_3.51_CDS_F/artist/HK_C35/LEF/c35b4/c35b4.lef ; INPUT LEF F /app11/AMS_3.51_CDS_F/artist/HK_C35/LEF/c35b4/CORELIB.lef ; INPUT LEF F /app11/AMS_3.51_CDS_F/artist/HK_C35/LEF/c35b4/IOLIB_4M.lef ; INPUT LEF F /class2/ug2/e05699a1/memory/cadence/sram128x8.lef ; ##-- CTLF Timing ##-- GCF File INPUT CTLF INITFILE "./c35b43.3V.gcf" ;

Figure 3.9 Commands to Import Library and Timing Data

- 47 -

After the timing data is imported, the Verilog models of the digital standard cells from the Hit-Kit have to be loaded before the Verilog netlist synthesized by Design Compiler can be imported and stored in DesignLib directory.
##-- Import Design Data ##-- Verilog INPUT VERILOG FILE /app11/AMS_3.51_CDS_F/verilog/c35b4/c35_CORELIB.v LIB DesignLib ; INPUT VERILOG FILE /app11/AMS_3.51_CDS_F/verilog/c35b4/c35_IOLIB_4M.v LIB DesignLib ; INPUT VERILOG FILE /class2/ug2/e05699a1/memory/cadence/sram128x8.v LIB DesignLib ; INPUT VERILOG FILE ./VERILOG/s8051.v LIB DesignLib REFLIB "DesignLib" DESIGN DesignLib.s8051:hdl ;

Figure 3.10 Commands to Import Library Model and Design Netlist


The Verilog netlist does not include power pads or corner cells, thus they must be added with an additional DEF file (Appendix F power_corner.def).

3.4.1.2 Floorplan
After the design library files are loaded, the synthesized netlist is translated into floorplan in Silicon Ensemble. The I/O to core distance is set to 100m, minimum row spacing to 0 and placement of cells are allowed to abut and flipped according to guidelines from AMS. Finally the floorplan is initialized by setting minimum chip dimension that would fit the required the I/O pads. For the final s8051 design, the height of the core is fixed at 2380.800m to match the height of an analog to digital converter design as well as to accommodate for 17 I/O pads on the right side of the chip. The row utilization is set at 65% which would result in chip width just enough to accommodate 14 I/O pads both on the top and bottom side of the chip. At 65% row utilization, there is ample space for routing of nets, and clock tree generation.

- 48 -

3.4.1.3 Placing I/O Pads and SRAM Block


The I/O pads are placed before any other components are placed on the floorplan. I/O pads are placed to desirable locations along the 4 sides of the chip and the locations are entered in a file named ioplaced.ioc (Appendix G). The s8051s Port 0 and least significant nibble of Port 2 are used to interface with the ADC, therefore it is placed on the left side of the chip. Their I/O pads will be removed later using the Cadence Virtuoso tool and signals will be routed manually from the ADC to the corresponding pins. The sram128x8 IP block is placed next at the top left hand corner and is oriented such that the blocks I/O pins are facing the center of the core (right and bottom) instead of facing the borders of the chip to ease the routing of nets from the core to the SRAM block. Rows are cut around the SRAM block to form blockhalo that would separate the block with the core.

3.4.1.4 Power Route Planning


Power and ground ring of width 40m are placed around the core and spaced 2m apart for connecting power supply to the core. Block rings of width 10m are placed around the SRAM block with 2m spacing to supply the SRAM block. In addition to the power rings, power strips are laid across the center of the chip vertically.

- 49 -

3.4.1.5 Placing Standard Cells


Before placing standard cells, the CAP cells are put at the left and right ends of rows. Finally the standard cells are placed with default setting.

3.4.1.6 Clock Tree Generation


A clock tree is generated using CT-Gen which determines the optimum clock tree for a given placement.

3.4.1.7 Placing Filler Cells


The gaps between standard cells are filled with filler cells to avoid design rule violation. The space between periphery cells is filled with periphery spacer cells that will connect power supply lines.

3.4.1.8 Power Routing


After placing the periphery cells, power routing is done by connecting the supply nets around the core to the 2 core power rings as well as from the SRAM block to the block power rings. The I/O pads are also supplied with power from the I/O rings

- 50 -

3.4.1.9 Clock and Normal Nets Routing


The clock nets are routed before all other remaining nets within the core are routed with Wroute, a timing driven router.

3.4.1.10

Saving of Design Files

After Placement and Routing are completed successfully, the design files which include the post-layout Verilog netlist, SDF delay file and parasitic RSPF file are saved for use in post_layout verification. The design is also saved in gds2 format to be streamed into Cadence Virtuoso layout tool for final preparation before submitted to the silicon vendor for fabrication.

Figure 3.11 Final Layout with Analog to Digital Converter (left) and s8051 (right) Extracted from Virtuoso

- 51 -

Figure 3.12 s8051 Layout Extracted from Silicon Ensemble


The final s8051 layout has an area of 5.2mm2 and dimension of 2187m x 2380.8m. The layout with analog design part combined with s8051 has an area of 7.3mm2 3064.275m x 2380.8m.

- 52 -

CHAPTER 4 BUILDING DEVELOPMENT TOOLS AND TESTING OF S8051 CHIP

The s8051 was fabricated on July 2004 run and was delivered on October 2004. As with any ICs taped-out, a turn-on board is required to test out the silicon. The board is designed to be able to exercise all the features of the IC in mind, thus it is usually not layout for minimum board size. A development board Printed Circuit Board (PCB) has been designed to turn-on the fabricated chip and a programmer board is designed and implemented on a FPGA development board [12] to program the flash memory used in our project. The results of this chapter will be used as a development platform for the ECG firmware development which will be carried out separately from the sub project reported in this thesis.

- 53 -

4.1

Printed Circuit Board Design

A PCB board is designed and fabricated according to the schematic diagram shown in Figure 4.1. The corresponding PCB layout is shown in Figure 4.2. Higher resolution of the PCB Layout can be found at Appendix H.
P3_2 - P3_7, P1_4 - P1_7, P0, P2 CLK 1 SS MISO SCK MOSI P1_3 P1_2 P1_1

OSC. MODULE

data addr

data

addr(16) GND

addr(0-15) g

SD CARD

7 5 2

S8051

rd wr

M29W010B

P1_0 rom_e/xrm_e P3_0

w 128kbytes Flash Memory e data addr(0-14) oe we ce

VDD

RST P3_1

T1IN

R1OUT

MAX3221
RS232 driver GND T1OUT R1IN

AS7C3256
32kbytes SRAM

RS232 INTERFACE

Figure 4.1 s8051 Test Circuit Schematic Diagram


The MAX3221 [5] IC converts UART signal from the s8051 to RS232 level so that it can communicate with a PCs serial port. The flash memory stores the 8051s firmware while the SRAM allows the 8051 to store any data during operation. A SD card connector is also included to allow interfacing with a SD card. A programmable clock module is selected that allow us to test s8051 at 8 different clock frequencies up to ~20MHz.

- 54 -

TOP LAYER

BOTTOM LAYER

A J F I

C E D

Components: A: s8051 B: MAX3221 C: buttons D: M29W010B 128kbytes Flash Memory

E: NAND gates for implementing Chip Select between Flash and SRAM F: SD Card connector G: Oscillator module H: AS7C3256 32kbytes SRAM I: 3.3V Voltage regulator J: Other IO pins

Figure 4.2 s8051 Test Circuit PCB Layout

- 55 -

4.2

Programming Flash Memory

To program the flash memory used in this project to store the firmware to be executed by the s8051, two components are needed. The two components needed are, a hardware which will interface directly to the flash memory chip (programmer board), and a software which runs on a Windows PC which will send the firmwares binary code to the programmer board.

4.2.1

Serial Port Terminal Program

Figure 4.3 Serial Port Terminal and Programmer Graphical User Interface

- 56 -

A serial port terminal software is developed using MS Visual Basic to monitor the flow of data to and from the s8051s UART. A flash memory programmer software is also implemented as part of the serial port terminal software to program the flash memory used in our project with the program intended to be execute by the s8051.

4.2.2

Programmer Board Table 4.1 Programmer Board Commands


COMMANDS CHANGES TO SIGNAL VALUE AND TASK PERFORMED BY PROGRAMMER BOARD ADDR DATA W G E SEND DATA 0x555 0x2AA 0x555 0x555 PC PC+1 PC+1 PC 0x555 PC PC PC PC PC PC+1 0xAA 0x55 0x80 0x10 High-Z High-Z High-Z High-Z 0xA0 D High-Z D 0 0 0 0 1 1 1 1 0 0 W W W 1 1 1 1 0 0 0 0 1 1 G G G E E E No No No No Yes Yes No No No No No No No No No No

Bit 7 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1

Bit 6 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1

Bit 5 0 0 0 0 0 0 0 0 0 1 1 1
PC 5 PC 11

Bit 4 0 0 0 0 0 0 0 0 1 0 1 1
PC 4 PC 10 PC 16

Bit 3 0 0 0 0 0 0 0 0 D7 D3 0 1
PC 3 PC 9 PC 15

Bit 2 0 0 0 0 1 1 1 1 D6 D2 E E
PC 2 PC 8 PC 14

Bit 1 0 0 1 1 0 0 1 1 D5 D1 W W
PC 1 PC 7 PC 13

Bit 0 0 1 0 1 0 1 0 1 D4 D0 G G
PC 0 PC 6 PC 12

0 1

*note: PC and D are internal registers of programmer board Yes at SEND DATA column indicates value at DATA bus will be transmitted to PC via the serial port - denote value unchanged PCx = xth bit of PC register will be updated Dx = xth bit of D register will be updated

The programmer software communicates with a programmer board implemented on the Spartan-IIE FPGA development board [12] using sequences of one byte commands sent through the serial port that would control the address bus, data bus and read and

- 57 -

write control pin as well as the memory enable pin on the programmer board to program the flash memory. The above commands in Table 4.1 are carefully designed to allow all read and program operations on the flash memory chip to be achievable.

data

PC Comm Port (DTE)

RC

Programmer Board (DCE)

addr g

M29W010B

TX

w 128kbytes Flash Memory e

Figure 4.4 Connections Between Programmer Board, PC and Flash Memory Chip

It is also optimized to allow shorter programming time by allocating frequently used flash command to be represented by fewer bytes command. For example, the byte 0xAA write to memory location 0x555 is frequently used to program a byte (as shown in Table 4.2 and Figure 4.3), are assigned to command 0x00, 0x33. Sending appropriate sequence of one byte commands to the programmer board will allow the signals at the memory interface to perform intended task such as reading, or writing data bytes to the flash memory. For example, sending the sequence of bytes 0x00, 0x33 will command the programmer to write the value 0xAA to address location 0x555 of the flash memory (as shown in Figure 4.3). In order to program the flash memory, sequence of bus write operation has to be performed on the flash memory chip by the programmer.

- 58 -

Table 4.2 Flash Memory Commands

All bus write operation to the flash memory is interpreted by its command interface according to commands in Table 4.2.

RECEIVED BYTE ADDR DATA W G E

0x00 0x555 0xAA

0x33

Figure 4.5 Programmer Board Waveform

- 59 -

Therefore, to program the value 0xDD to memory location 0x00 (assuming the register PC of the programmer board is initialized to 0), the sequence of bytes 0xAA, 0x55, 0xA0, 0xDD has to be written to memory location 0x555, 0x2AA, 0x555, 0x000 respectively which corresponds to the sequence of bytes 0x00, 0x33, 0x01, 0x33, 0x1D, 0x33, 0x2D, 0xE3 being sent to the programmer board from the PC through the serial port. Continuing to send 0x00, 0x33, 0x01, 0x33, 0x1B, 0x33, 0x2B, 0xE3 to the programmer board will program the byte 0xBB to the next memory location (0x0001) as the command 0xE3 would increment the PC register from 0 to 1. Therefore, on average, to program a byte into the flash memory, 8 bytes would need to be sent to the programmer via the serial port. The programmer is designed to communicate with the PCs serial port at baud rate of 115.2kbps. Since there are 10 bits sent for every byte (1 start bit, 1 stop bit and 8 data bits), the programming rate is 115.2kbps = 1.44kbytes / s . 8bytes 10bits _ sent _ per _ byte

4.3

Debugging the s8051

The serial port terminal software, programmer board and the s8051 PCB together with an 8051 compiler forms the debugging and development tools for the s8051 chip. The open source compiler, SDCC [15] is used in our project to compile all the 8051 source codes. To test the s8051 for any flaws, a test program is written and programmed into the flash memory. Any error is detected by observing the output signals at the general purpose I/O ports or by sending debug information from the s8051 through its serial port to be monitored in the serial port terminal software running on a Windows PC. If the output signals match the expected values, the s8051 is considered to be functioning properly for the tested functions. - 60 -

Otherwise, a very low clock source may be feed into the s8051 while the address bus is monitored using oscilloscope and the program flow can be compared with functional simulation waveform results that are verified to be correct to trace any possible fault. However this can be very tedious and time consuming.

Figure 4.6 Output from s8051s Serial Port While Developing loader.c

Figure 4.7 Photo of s8051 Connected to the Casira Bluetooth Deveopment Board

- 61 -

The s8051 which we sent for fabrication on July 2004 pass all the instruction tested with testall.c at clock frequency up to 80MHz (generated from a signal generator). Figure 4.6 shows the output from s8051s serial port sent via the Casira Bluetooth Development Kit to a PCs Bluetooth module during the development of loader.c. Figure 4.7 shows Photo of s8051 Connected to the Casira Bluetooth Deveopment Board The s8051 chip is used to successfully implement a SD Card loader firmware. Since the s8051 is able to execute testall.c as well as other firmware (eg. loader.c) to perform other features of 8051 not tested in testall.c (built-in timer, serial port, and its interrupt handling) correctly, the s8051 chip fabricated during the first fabrication run is considered to be working flawlessly.

- 62 -

CHAPTER 5 INTERFACING SECURE DIGITAL CARD AND BLUETOOTH SERIAL PORT PROFILE

As described in the previous chapters, this chapter is motivated by the needs to have SD card feature to store ECG data and 8051 program (to reduce component count) and wireless Bluetooth capability in the ECG. To illustrate that the s8051 is capable to interface a SD card, a boot loader from SD card feature is implemented in the s8051 chip and to verify that the s8051 has enough extra processing powers to handle the SPI protocol at the required transfer rate. The Casira Bluetooth development kit is used to develop the serial port profile to connect the s8051 to a Bluetooth enabled PC and handhelds.

5.1

SD Card Implementation and 8051 Processing Capability Considerations

The Secure Digital (SD) Card [14] is a flash-based memory card that is designed to meet security, capacity, performance and environmental requirements of todays consumer electronics devices. The SD Card communication is based on a nine-pin interface (Clock, Command, 4xData and 3xPower lines) designed to operate in a low voltage range. In addition to the SD Card interface, an alternate communication protocol based on the SPI standard can be used to interface the SD card. The current SD Card provides up to 1024 million bytes (1GB) of memory using flash memory chips, which were designed especially for use in mass storage applications.

- 63 -

The high capacity, small form factor and low-voltage operation of SD Card make it a desirable storage device for use in a portable ECG device to store large amount of ECG data.

Figure 5.1 SD Card Block Diagram


In this project, the s8051 is interfaced to the SD Card via the SPI bus interface instead of the SD bus because of its ease of implementation and less number of pins required. As we have discussed in the Area Constraint section of Chapter 3, the final silicon area taken up by the s8051 is determined by the number of I/Os we have, therefore it is desired for us to reduce pin count to lower the chip cost. Although the SD bus interface (having wider bus width) allows up to 4x faster transfer rate, it is not required as the SPI interface is able to provide the bandwidth requirement of the ECG system. At 200Hz sampling rate of 3 channels of 12 bits data, the bandwidth requirement is only 7.2kbps. The main concern now is whether the s8051 processing power is sufficient to emulate a SPI bus.

- 64 -

Through simulations of loader.c which we developed in later part of the chapter, we have found that the s8051 running at 19.7MHz is capable of comfortably transferring data via SPI to SD card at 15.17kHz while handling the FAT16 file system. This leaves more than half the s8051s processing capability to handle other functions such as reading data from the ADC, or response to users input. At 19.7MHz, the s8051 is capable of 19.7 8 = 2.5 MIPS. Since half the processing is
used for SD card interfacing and FAT16 file system, we have 1.35 MIPS left for these ECG functions. It is easy to conclude that a read/sample from the ADC registers would not take more than a few instructions per sample (a 8 bit byte read instruction is 2 machine cycles long for s8051), therefore the processing requirement to read 7.2kbps of data would only take an order of 7.2 8 = 0.9 KIPS (thousands of instructions per seconds) out of the 1.35MIPS left. The s8051 is capable of operating at higher frequencies higher than 40MHz. We can therefore resolve to higher clock frequency should the need for higher processing capability is required for other functions.

- 65 -

Figure 5.2 SD Card Pin Assignment and Definitions in SPI Mode

5.2

Serial Peripheral Interface (SPI)

The Serial Peripheral Interface (SPI) is a full-duplex, synchronous, serial data link that is standard across many microprocessors, microcontrollers, and peripherals. It enables communication between microprocessors and peripherals and/or inter-processor communication. The SPI bus consists of four wires which carry information between the devices connected to the bus. They are, Serial Clock (SCK), Master Out Slave In (MOSI), Master In Slave Out (MISO), and Slave Select (SS). It can be used to interface a single Master/Host with multiple Slaves.

- 66 -

The SCK control line is driven by the SPI Master and regulates the flow of data bits. The SCK line transitions once for each bit in the transmitted data. The MOSI data line carries the output data from the master which is shifted as the input data to the selected slave. The MISO data line carries the output data from the selected slave to the input of the master. When an SPI transfer occurs, an 8-bit data word is shifted out one interface pin while a different 8-bit data word is being shifted in on another interface pin. This can be viewed as an 8-bit shift register in the SPI Master device and another 8-bit shift register in a SPI slave device that are connected as a circular 16-bit shift register. When a transfer occurs, this 16-bit shift register is shifted 8 positions, thus exchanging the 8-bit data between the master and slave devices.

Figure 5.3 SD Card SPI Bus Topology

Figure 5.4 Waveform Showing Data Transfer on SPI Bus

- 67 -

5.3

SD Cards SPI Protocol

The SD Cards SPI channel is byte oriented. Every command or data block is built of eight bit bytes and is byte aligned (multiples of eight clocks) to the CS (SS of SPI bus) signal. The SPI messages are built from command, response and data-block tokens. All communication between host and cards is controlled by the host (master). The host starts every bus transaction by asserting the CS signal low. The selected card will always responds to a host command with a 1 or 2 bytes responds. When the card encounters a data retrieval problem, it will respond with an error response. In addition to the command response, every data block sent to the card during write operations will be responded with a special data response token.

5.3.1

SPI Mode Selection

The SD Card wakes up in the SD Bus mode. It will enter SPI mode if the CS signal is asserted (negative) during the reception of the reset command (CMD0). If the card recognizes that the SD Bus mode is required it will not respond to the command and remain in the SD Bus mode. If SPI mode is required, the card will switch to SPI mode and respond with the SPI mode R1 response. Once in SPI mode, the only way to return to the SD Bus mode is by power cycling the card. By default, the command structure/protocol for SPI mode has CRC checking disabled. However, since the card powers up in SD Bus mode, CMD0 must be followed by a valid CRC byte. Once in SPI mode, CRCs are disabled by default.

- 68 -

CMD0 is a static command and always generates the same 7-bit CRC of 4Ah. Adding the 1, end bit (bit 0) to the CRC creates a CRC byte of 95h. The following hexadecimal sequence can be used to send CMD0 in all situations for SPI mode (with or without CRC). The entire CMD0 sequence appears as 40 00 00 00 00 95 (hexadecimal). Every SD Card token transferred on the bus is protected by CRC bits. In SPI mode, the SD Card offers a non protected mode which enables systems built with reliable data links to exclude the hardware or firmware required for implementing the CRC generation and verification functions. In the non-protected mode the CRC bits of the command, response and data tokens are still required in the tokens but are defined as dont care for the transmitters and ignored by the receivers. For simplicity purpose, the default non protected mode is used in our project.

5.3.2

Reset Sequence

The SD Card requires a defined reset sequence. After power on reset or CMD0 (software reset), the card enters an idle state. At this state, the only legal host commands are CMD1 (SEND_OP_COND), ACMD41 (SD_SEND_OP_COND), CMD59 (CRC_ON_OFF) and CMD58 (READ_OCR). The host must poll the card (by repeatedly sending CMD1) until the in-idle-state bit in the card response indicate (by being set to 0) that the card completed its initialization processes and is ready for the next command.

- 69 -

5.3.3

Reading Single Block

SPI mode supports single block and multiple block read operations (SD Card CMD17 or CMD18). Upon reception of a valid read command the card will respond with a response token followed by a data token in the length defined in a previous SET_BLOCK_LENGTH (CMD16) command as shown in Figure 5.5. By default, the block length is equal to the maximum block length of 512bytes.

Figure 5.5 Single Block Read Operation


The 16 bit CRC suffixed at the end of the data block has been omitted in this project.

5.3.4

SPI Command

All the SD Card commands are 6 bytes long and transmitted MSB first.

Figure 5.6 SD Command Format

- 70 -

Table 5.1 Selected SPI Bus Command

Table 5.1 provides a detailed description of selected SPI bus commands used in the SD
Card loader program. The binary code of a command is defined by the mnemonic symbol. As an example, the content of the Command field for CMD0 is (binary) 000000 and for CMD39 is (binary) 100111.

5.3.5

Responses

There are several types of response tokens depending on the command received by the SD Card. For the 3 command used, the only response is the R1 response. This response token is sent by the card after every command with the exception of SEND_STATUS commands. It is 1 byte long, the MSB is always set to zero and the other bits are error indications.

Figure 5.7 R1 Response Format

- 71 -

A 1 at the corresponding bit signifies an error idle state - The card is in idle state and running initializing process. Erase reset - An erase sequence was cleared before executing because an out of erase sequence command was received Illegal command - An illegal command code was detected Communication CRC error - The CRC check of the last command failed Erase sequence error - An error in the sequence of erase commands occurred Address errorA misaligned address, which did not match the block length was used in the command Parameter errorThe commands argument (e.g., address, block length) was out of the allowed range for the card

5.3.6

Data Token

Read and write commands have data transfers associated with them. Data is being transmitted or received via data tokens. Data tokens are 4 to 515 bytes long and have the following format for single block read: First byte = 254 Bytes 2-513 (depends on the data block length): User data Last two bytes are the 16 bit CRC

5.3.7

Card Responses to Single Block Read

The following timing diagram describes the card response to single block read operations.

- 72 -

Figure 5.8 Single Block Read Timing Diagram

5.4

FAT16 File System

To allow files created under MS Windows to be read from the 8051, the FAT16 file system [13] is implemented. Under a FAT16 system, the storage device is organized as follows:

Table 5.2 Master Boot Record (MBR) located at the first sector or 512 bytes
Offset 000h 1BEh 1CEh 1DEh 1EEh 1FEh Description Executable Code (Boots Computer) 1st Partition Entry 2nd Partition Entry 3rd Partition Entry 4th Partition Entry Executable Marker (55h AAh) Size 446 Bytes 16 Bytes 16 Bytes 16 Bytes 16 Bytes 2 Bytes

- 73 -

Table 5.3 Partition entry (part of MBR)


Offset 00h 01h 02h 04h 05h 06h 08h 0Ch Description Current State of Partition (00h=Inactive, 80h=Active) Beginning of Partition - Head Beginning of Partition - Cylinder/Sector (See Below) Type of Partition (See List Below) End of Partition Head End of Partition - Cylinder/Sector Size 1 Byte 1 Byte 2 Bytes 1 Byte 1 Byte 2 Bytes Example entry 00h 03h 0006h 06h 0Fh CCE0h 00000065h 0007999Bh

Relative Sector, # of Sectors Between the MBR and the 4 Bytes First Sector in the Partition Number of Sectors in the Partition 4 Bytes

From the relative sector found in the partition entry, the start of a partition is located.

Table 5.4 FAT16 Drive Layout


Offset Start of Partition (located from the relative sector) Start + # of Reserved Sectors Start + # of Reserved + (# of Sectors Per FAT * 2) Description Boot Sector (512 bytes in length) Fat Tables Root Directory Entry

Start + # of Reserved + (# of Sectors Per FAT * 2) + Data Area (Starts with Cluster #2) ((Maximum Root Directory Entries * 32) / Bytes per Sector)

The boot sector, located at the first sector of the partition contains other important information (partition size, number of reserve sectors, etc.) needed to access the partition. A partition is divided into many clusters. A Cluster is a Group of Sectors on the drive that has information in them. Each Cluster is given a 2 bytes entry in the FAT Table that indicates whether or not that cluster has data in it, and if so, if it is the end of the data or there is another cluster after it.

Table 5.5 FAT Table Entries


FAT Code Range 0000h 0002h-FFEFh FFF0h-FFF6h FFF7h FFF8h-FFFF Meaning Available Cluster Used, Next Cluster in File Reserved Cluster BAD Cluster Used, Last Cluster in File

- 74 -

Another important data structure in a FAT16 drive is the directory table which stores information such as filename, size, cluster number the file is located, and so on.

Table 5.6 Directory Entry for 8.3 Filenames


Offset 0 8 Length 8 bytes 3 bytes Value Name Extension Attribute (00ADVSHA) 0: unused bit A: archive bit, R: read-only bit S: system bit D: directory bit V: volume bit *Long file name entries have bit RHSV set to 1 Time Date Cluster number (starting cluster of file) File Size

11

byte

22 24 26 28

2 bytes 2 bytes 2 bytes 4 bytes

The directory table is a singly link list of 32 bytes data containing file information. For the traditional DOS 8.3 filenames, each 32 bytes entry in the directory table is organized as shown in Table 5.6. Since the new long file name format are designed to be compatible with older operating systems, directory entries for long file names can be safely ignored by the boot loader program.

- 75 -

5.5

Boot Loader Program

Initialize s8051
- setup timer 0 to interrupt at 30.34kHz - setup direction of I/O ports

Initialize SD Card to SPI mode


- s8051 send CMD0 to resets SD Card - the host poll the card (by repeatedly sending CMD1) until the SD Card in idle state

Read 1st sector (MBR) of SD Card to obtain the relative sector of the FAT16 partition Read 1st sector (Boot Record) of FAT16 partition to get information required to calculate the location of Root Directory

Search through Root Directory for PROGRAM.BIN If file exist, load file into external memory and set PCON(6), else loop infinite Figure 5.9 Boot Loader (loader.c) Program Flow
The SD Card boot loader program can be found at Appendix I loader.c. The SPI bus functions are implemented with the run_spi() function and the timer 0 interrupt. The timer 0 interrupt is used to generate the required frequency for the serial clock. The run_spi() function shifts 8 bits through the spi_buff variable where the initial content is shifted out off MOSI, and replaced by bit stream from MISO. The timer 0 interrupt frequency should not be set too high as ample time should be allowed for the codes to shift bits out from MOSI and toggle SCK before the next timer 0 interrupt occur. The appropriate timer 0 reload frequency is determined by

- 76 -

studying the simulation waveform to determine the actual number of machine cycles needed to execute codes from spi_run(). There should not be any concern of violating the SD Cards timing as SD Cards operates at frequencies in the range of MHz which is unreachable by the s8051 under normal condition unless very high clock rate is used.

Figure 5.10 Simulation Waveform of SD Card Reset Command (loader.c)

Figure 5.10 shows the post-layout simulation waveform of s8051 running loader.c
where the waveform is showing the SD Card reset command (CMD0) is being sent through the SPI bus. The CMD0 sequence is 40 00 00 00 00 95 (hexadecimal). Two functions sdInit() and sdRdBlk(addr) are used to initialize the SD Card to SPI mode, and to read a single block from location addressed by the addr variable. FAT16 functions is implemented in getBootRecord(), FATentry(clust), searchFile() and loadFile(). As suggested by the function name, getBootRecord() is called to retrieve boot records from the FAT16 partition. The boot record will be used to determine the location of the root directory for that partition.

- 77 -

FATentry(clust) returns the FAT table entry (next cluster number of the file) pointed
by clust.

searchFile(filename) search through the root directory for the file with name equal to filename and stores the 32 bytes file entry which contains information about file size
and starting cluster number. Finally, loadFile() loads the file reference by the stored file entry into the external memory.

5.6

Bluetooth Serial Port Profile

As discussed in earlier chapters, part of the requirement of the ECG device is the ability to send ECG data for life monitoring at a PC or handheld device. Bluetooth technology is found to be ideal for this application for its low power and cable replacement ability. The Casira Bluetooth development kit [17] has been purchased to develop a serial port profile. The Casira BlueLab [18] is a software development kit intended to allow the development of applications running embedded on the BlueCore from CSR and provides an easy way to write Bluetooth applications. The application running on a Virtual Machine (VM) in the BlueCore communicates directly to a host controller such as an 8051 thus removing the trouble of having to follow any Bluetooth Host Controller Interface communication protocols. The serial port profile sample application [18, 19] provided together with the BlueLab can be used for our purpose as it will simply redirect any data between the BlueCores UART bus to a Bluetooth enabled PC or handheld device.

- 78 -

The Bluetooth module on the PC or handheld will emulate a virtual serial port. Therefore, the 8051 need only to communicate data it wishes to send to or receive from the PC with the BlueCores UART directly as if it is connected directly to the PCs serial port via a cable.

CTS TX RC

RTS RC TX CTS DCE

s8051

Bluetooth enabled PC/Handheld

RTS

BCM-03
DTE unpair GND device PIO5 Bluetooth module

Bluetooth link

Figure 5.11 Bluetooth Connection Between s8051 and Bluetooth Enabled PC through the Bluetooth Serial Port Service

- 79 -

CHAPTER 6 CONCLUSION

6.1

Conclusion

The objective of this project is to develop the digital portion of building blocks needed to implement an ECG device. It has been completed successfully. The results and outcome of this project is summarized below:

1. s8051 Microcontroller An Intel 8051 compatible microcontroller is developed and tested to be working flawlessly with a 19.6608MHz oscillator module. The s8051 designed in this project is based on UCR Daltons 8051 model (c8051) and has several enhancements made. Built-in timer, serial port, and interrupt handling is implemented according to Intel 8051s specifications to ensure compatibility with existing 8051 compilers. The architecture re-design enables the s8051 of 2.5MIPS at 19.7MHz which compares favorably to c8051 (1.2MIPS) and i8051 (1.63MIPS) in addition to fewer long cycles instructions comparing to original i8051 instruction set. Simulations results confirms that the s8051 developed has enough processing capability to implement an ECG device with 50% utilization for SD card interfacing and FAT16 file format, and 50% for actual data acquisition and other miscellaneous processing at 200Hz sampling of 3 lead ECG data.

- 80 -

2. Development Platform and Testing of s8051 Development tools such as programmer board, programmer software, serial port terminal, and a development board are developed to debug and develope applications for the s8051 microcontroller. Testing of the s8051 chip is done by loading testall.c (a benchmark program which tests most of the 8051 instruction sets), and developing other firmware to test other features and aspect of s8051. The s8051 which we sent for fabrication in July 2004 pass all the instructions tested with testall.c at clock frequency up to 80MHz.

3. Firmware Development Due to the unavailability of the analog data acquisition portion (prepared by another research scholar at the university, pending chip delivery from vendor) at the time the project is carried out, an ECG machine firmware is not developed. Instead, a SD Card loader firmware which can be conveniently modified to include the function of ECG data acquisition is developed. The FAT16 file format allows convenient data exchange between the ECG device with a Window based PC.

4. ECG Chip The analog data acquisition portion implemented by another Research Scholar is combined with the s8051 of this project and was sent for fabrication in April 2005.

Together, all the above mentioned components form the required Intellectual Properties to implement a portable ECG device.

- 81 -

6.2

Problems Encountered and Solutions

There are many problems and difficulties encountered during the course of this project. Most of them are due to the lack of experience in IC Design field during the initial stage of the project. As there are not many Research Scholars in NUS dealing with digital IC Design, few resources and reference are available. I am grateful that the engineers at CMP are kind enough to offer their assistance when needed. Initially, there were difficulties sourcing for external ROM or flash memory chip (with 3.3V operating voltage) that were able to meet the low access time requirement of the s8051 memory interface as these high speed ROM/Flash memory were not available at any of the local electronics distributors (eg. Farnell, RS etc.). To resolve this, the access time requirement is lowered by redesigning the memory interface to allow 3 clock cycles of access time (as outlined in chapter two of this report). Problems were also encountered during the initial stage of debugging the s8051 chip. Development and debugging tools were not available at the time the s8051 chip was sent for fabrication. Therefore, these tools are built from scratch. Other problems encountered are mainly due to human error. Example, when testing the s8051 chip, the bit 5 of data bus was not connected properly and the chip seemed to be not working. This is detected by using very low clock frequency (few Hz) and monitoring the address and data buses manually for inconsistencies with simulated results, and bit 5 of data bus was found to be the cause.

- 82 -

6.3

Future Improvements and Recommendations

The followings are recommendations for future improvements: The actual ECG firmware can be developed once the final chip sent for tapeout which has the data acquisition portion is available The SD Card function implemented in this project only covers reading of information from the SD Card. To allow the ECG device to write ECG data into a SD Card, the firmware developed in this project can be modified to include this feature. Dedicated SPI controller block or SDIO controller can be integrated into the ECG chip to lessen the burden of 8051. To further improve the performance of s8051, there are potential of further pipelining the s8051 state machines to four stages to reduce the CPI from eight to four. The use of newer CAD tools such as Synopsis Power Compiler can result in lower power consumption by using gated clocks to disable clock switching to portion of the chip that are not exercised. The ALU sub block of the s8051 design is implemented as an asynchronous block. However, the Synopsys Design Compiler (synthesis tool) cannot automatically synthesize or verify the behavior of asynchronous logic. The ALU sub block can be redesigned as a synchronous block, but the whole s8051 design may need to be changed drastically in order to do so.

- 83 -

Internal programmable memory can be added to the s8051 design in the future removing the needs of an external flash memory component and reduce the number of pins needed to interface with an external memory device. Currently, the IP from AMS to implement one-time programmable ROM proves to be prohibitly expensive.

Alternatively, we may include enough internal SRAM into the s8051 design so that programs can be loaded into the internal RAM instead of an external RAM chip. But this may increase the final area of the chip significantly and increase the cost for fabricating the chip.

- 84 -

REFERENCES

[1] [2] [3] [4]

VLSI Implementation and Testing of 8-bit microprocessor, 2003 Intel, MCS 51 Microcontroller Family Users Manual, 1994 Austriamicrosystems, 0.35m CMOS Digital Standard Cell Databook ST, M29W010B 1 Mbit (128Kb x8, Uniform Block) Low Voltage Single Supply Flash Memory

[5]

Maxim, 1A Supply-Current, True +3V to +5.5VRS-232 Transceivers with AutoShutdown

[6] [7] [8] [9] [10]

Synopsys, Design Compiler User Guide Cadence, Silicon Ensemble Reference Manual AMS, Silicon Ensemble Guide Synopsys, Prime Time User Guide Digital IC Design Lab Manual, rev 1.1, prepared by Jiang Bin, revised by Yong Sheue Fen, 2002 ECE National University of Singapore

[11] [12] [13] [14]

AMS, 0.35um Static RAM DATASHEET Memec Design, Spartan-IIE LC Development Board Users Guide, 2003 Microsoft, FAT: General Overview of On-Disk Format Version 1.02, 1999 SanDisk Corp., SanDisk Secure Digital Card Product Manual Version 1.9, December 2003

[15]

The

UCR

Dalton

Project, Synthesizable VHDL model of

8051

www.cs.ucr.edu/~dalton/i8051/i8051syn, 2001 [16] [17] [18] SDCC Compiler User Guide CSR, BlueCore Casira User Guide CSR, BlueLab Professional 2.82 guide

- 85 -

[19] [20]

CSR, Serial Port Profile Thaddeus R. F. Fulford-Jones, Gu-Yeon Wei, Matt Welsh, A Portable, Low-

Power, Wireless Two-Lead EKG System, Proceedings of the 26th Annual International Conference of the IEEE EMBS [21] Konrad Lorincz, David J. Malan, Thaddeus R.F. Fulford-Jones, Alan Nawoj,

Antony Clavel, Victor Shnayder, Geoffrey Mainland, and Matt Welsh, Sensor Networkds for Emergency Response: Challenges and Opportunities, Pervasive Computing, IEEE CS and IEEE ComSoc [22] Tia Gao, Dan Greenspan, Matt Welsh, Radford R. Juang, and Alex Alm, Vital

Signs Monitoring and Patient Tracking Over a Wireless Network , Proceedings of the 27th Annual International Conference of the IEEE EMBS [23] Jing Bai, Yonghong Zhang Delin Shen, Lingteng Wen, Chuxiong Ding, Zijing

Cui, Fenghuo Tian, Bo Yu, Bing Dai, jupeng Zhang, A Portable ECG and Blood Pressure Telemonitoring System, IEEE Engineering in Medicine and Biology, 1999 [24] Tommi Raita-aho, Tapio Saramtiki, and Olli Vainio, A Digital Filter Chip for

ECG Signal Processing, IEEE Transactions on Instrumentation and Measurement, VOL. 43, NO 4, AUGUST 1994 [25] Byung S. Kim, Sun K. Yoo, and Moon H. Lee, Wavelet-Based Low-Delay

ECG Compression Algorithm for Continuous ECG Transmission, IEEE Transactions on Information Technology in Biomedicine, VOL. 10, NO. 1, JANUARY 2006 77 [26] M. Edward Wombie, John S. Halliday, Sanjoy K. Mitter, Malcolm C.

Lancaster, and Jown H. Triebwasser, Data Compression for Storing and Transmitting ECGs/VCGs, Proceedings of the IEEE, VOL. 65, NO. 5, MAY 1977 [27] Michel Bertrand, Robert Guardo, Fernand A. Roberge, and Pierre Blondeau,

Microprocessor Application for Numerical ECG Encoding and Transmission

- 86 -

[28]

Ross Bannatyne & Greg Viot, Introduction to Microcontrollers, Motorola

Semiconductor Products Sector Transportation Division, Austin, Texas. [29] Kai Hwang, Advance Computer Architecture: Parallelism, Scalability,

Programmability, McGraw-Hill [30] Daniel S. Query and Gary Tescher, Advantages of Microprocessors versus

Discrete Digital Electronics in Appliance Controls, IEEE Transactions on Industry Applications, Vol. 26 No. 6 Nov/Dec 1990 [31] Yuan-Long Jeang, Gwo-Yang Wu, and Liang-Bi Chen, A Microcontroller IP

Generator for SOC Platform, 2004 IEEE Asia-Pacific Conference on Advance System Integrated Circuits/Aug, 4-5, 2004 [32] Tsai Chi Huang, Sudhakar Yalamanchili, Roy W. Melton, Philip R. Bingham,

and Cecil O. Alford, Teaching Pipelining and Concurrency using Hardware Description Languages, Georgia Institute of Technology, Atlanta, Georgia

- 87 -

APPENDIX A .synopsys_dc.setup
/* Owner: austriamicrosystems AG HIT-Kit: Digital */ AMS_DIR = /app11/AMS_3.51_CDS_F SYNOPSYS = /app11/synopsys search_path = {"." \ AMS_DIR + "/synopsys/c35_3.3V" \ SYNOPSYS + "/synthesis/libraries/syn" \ SYNOPSYS + "/synthesis/dw/sim_ver"} symbol_library target_library = {c35_CORELIB.sdb c35_IOLIB_4M.sdb} = {c35_CORELIB.db c35_IOLIB_4M.db};

/*** use following target-library for 3bus cells */ /* target_library = {c35_CORELIB_3B.db c35_IOLIB_3B_4M.db }; */ /* symbol_library = {c35_CORELIB_3B.sdb c35_IOLIB_3B_4M.sdb }; */ synthetic_library = {/app11/synopsys/synthesis/libraries/syn/dw_foundation.sldb /proj1/pg/p31324r/memory/synopsys/sram128x8.db}; link_library = "*" + target_library + synthetic_library; verilogout_equation = "false"; write_name_nets_same_as_ports = "true"; verilogout_single_bit = "false"; hdlout_internal_busses = "true"; bus_inference_style = "%s[%d]"; sdfout_no_edge = "true";

- 88 -

APPENDIX B dc_script.scr
analyze -format analyze -format analyze -format analyze -format analyze -format analyze -format analyze -format analyze -format analyze -format analyze -format analyze -format analyze -format elaborate s8051 vhdl -lib WORK {"/var/tmp/weiseng/3003/syn/i8051_lib.vhd"}; vhdl -lib WORK {"/var/tmp/weiseng/3003/syn/reset_logic.vhd"}; vhdl -lib WORK {"/var/tmp/weiseng/3003/syn/mem_interface.vhd"}; vhdl -lib WORK {"/var/tmp/weiseng/3003/syn/loader.vhd"}; vhdl -lib WORK {"/var/tmp/weiseng/3003/syn/i8051_timer.vhd"}; vhdl -lib WORK {"/var/tmp/weiseng/3003/syn/i8051_serial.vhd"}; vhdl -lib WORK {"/var/tmp/weiseng/3003/syn/i8051_ram.vhd"}; vhdl -lib WORK {"/var/tmp/weiseng/3003/syn/i8051_dec.vhd"}; vhdl -lib WORK {"/var/tmp/weiseng/3003/syn/i8051_ctr.vhd"}; vhdl -lib WORK {"/var/tmp/weiseng/3003/syn/i8051_alu.vhd"}; vhdl -lib WORK {"/var/tmp/weiseng/3003/syn/i8051_all.vhd"}; vhdl -lib WORK {"/var/tmp/weiseng/3003/syn/s8051.vhd"}; -arch "Behavioral" -lib WORK -update;

current_design s8051; reset_design; create_clock -period 20 -name clk find(port, clk); set_dont_touch_network find(port, clk); set_clock_uncertainty -setup 0.3 find(port, clk); set_clock_uncertainty -hold 0.3 find(port, clk); set_operating_conditions -lib /app11/AMS_3.51_CDS_F/synopsys/c35_3.3V/c35_CORELIB.db:c35_CORELIB -max WORST set_false_path -from rst; set_wire_load_model -lib /app11/AMS_3.51_CDS_F/synopsys/c35_3.3V/c35_CORELIB.db:c35_CORELIB -name 10k; set_wire_load_mode enclosed; set_input_delay -max 10 -clock clk {data}; set_input_delay -min 1 -clock clk {data};

set_output_delay -max 3 -clock clk {addr,rd,wr,data}; set_output_delay -min 0 -clock clk {addr,rd,wr,data};

remove_input_delay find(port, clk); set_load 0.1 all_outputs(); set_fix_multiple_port_nets -feedthroughs -constants;

- 89 -

APPENDIX C Synthesis Report


**************************************** Report : area Design : s8051 Version: 2002.05-SP2 Date : Wed Mar 30 23:59:20 2005 **************************************** Library(s) Used: c35_IOLIB_4M (File: /app11/AMS_3.51_CDS_F/synopsys/c35_3.3V/c35_IOLIB_4M.db) sram128x8 (File: /proj1/pg/p31324r/memory/synopsys/sram128x8.db) c35_CORELIB (File: /app11/AMS_3.51_CDS_F/synopsys/c35_3.3V/c35_CORELIB.db) Number Number Number Number of of of of ports: nets: cells: references: 55 148 57 8 2707046.250000 441926.500000 277803.000000 3149103.750000 3426775.750000

Combinational area: Noncombinational area: Net Interconnect area: Total cell area: Total area: 1

**************************************** Report : power -analysis_effort low Design : s8051 Version: 2002.05-SP2 Date : Thu Mar 31 00:02:20 2005 ****************************************

Library(s) Used: c35_IOLIB_4M (File: /app11/AMS_3.51_CDS_F/synopsys/c35_3.3V/c35_IOLIB_4M.db) sram128x8 (File: /proj1/pg/p31324r/memory/synopsys/sram128x8.db) c35_CORELIB (File: /app11/AMS_3.51_CDS_F/synopsys/c35_3.3V/c35_CORELIB.db) Warning: The library cells used by your design are not characterized for internal power. (PWR-26) Operating Conditions: WORST Library: c35_CORELIB Wire Load Model Mode: enclosed Design Wire Load Model Library -----------------------------------------------s8051 10k c35_CORELIB I8051_ALL 10k c35_CORELIB mem_interface 10k c35_CORELIB INT_ROM 10k c35_CORELIB I8051_CTR 10k c35_CORELIB I8051_CTR_DW01_inc_16_1 10k c35_CORELIB I8051_CTR_DW01_inc_16_2 10k c35_CORELIB reset_logic 10k c35_CORELIB I8051_DEC 10k c35_CORELIB I8051_RAM 10k c35_CORELIB i8051_timer 10k c35_CORELIB i8051_timer_DW01_inc_16_1 10k c35_CORELIB i8051_timer_DW01_inc_16_2 10k c35_CORELIB i8051_timer_DW01_inc_8_1 10k c35_CORELIB

- 90 -

i8051_serial 10k i8051_serial_DW01_add_6_1 10k i8051_serial_DW01_add_6_2 10k I8051_ALU 10k I8051_ALU_DW01_add_9_1 10k I8051_ALU_DW01_sub_5_1 10k I8051_ALU_DW01_add_16_1 10k I8051_ALU_DW01_add_16_2 10k I8051_ALU_DW01_add_5_1 10k I8051_ALU_DW01_addsub_9_1 10k I8051_ALU_DW02_mult_8_8_1 10k I8051_ALU_DW01_add_11_0 10k I8051_ALU_DW01_sub_9_5 10k I8051_ALU_DW01_sub_9_2 10k I8051_ALU_DW01_sub_9_6 10k I8051_ALU_DW01_sub_9_4 10k I8051_ALU_DW01_sub_9_0 10k I8051_ALU_DW01_sub_9_7 10k I8051_ALU_DW01_sub_9_3 10k I8051_ALU_DW01_sub_9_1 10k

c35_CORELIB c35_CORELIB c35_CORELIB c35_CORELIB c35_CORELIB c35_CORELIB c35_CORELIB c35_CORELIB c35_CORELIB c35_CORELIB c35_CORELIB c35_CORELIB c35_CORELIB c35_CORELIB c35_CORELIB c35_CORELIB c35_CORELIB c35_CORELIB c35_CORELIB c35_CORELIB

Global Operating Voltage = 3 Power-specific unit information : Voltage Units = 1V Capacitance Units = 1.000000pf Time Units = 1ns Dynamic Power Units = 1mW (derived from V,C,T units) Leakage Power Units = Unitless

Cell Internal Power Net Switching Power Total Dynamic Power Cell Leakage Power

= =

0.0000 mW 15.9133 mW --------= 15.9133 mW = 0.0000

(0%) (100%) (100%)

1 design_analyzer> **************************************** Report : timing -path full -delay max -max_paths 1 Design : s8051 Version: 2002.05-SP2 Date : Thu Mar 31 00:02:21 2005 **************************************** Operating Conditions: WORST Library: c35_CORELIB Wire Load Model Mode: enclosed Startpoint: U_I8051/U_CTR/alu_src_2_reg[1] (rising edge-triggered flip-flop clocked by clk) Endpoint: U_I8051/U_CTR/reg_cy_reg (rising edge-triggered flip-flop clocked by clk) Path Group: clk Path Type: max Des/Clust/Port Wire Load Model Library -----------------------------------------------s8051 10k c35_CORELIB I8051_ALU 10k c35_CORELIB I8051_ALL 10k c35_CORELIB I8051_ALU_DW01_sub_9_4 10k c35_CORELIB I8051_ALU_DW01_sub_9_3 10k c35_CORELIB I8051_ALU_DW01_sub_9_5

- 91 -

10k I8051_CTR 10k I8051_ALU_DW01_sub_9_2 10k I8051_ALU_DW01_sub_9_6 10k I8051_ALU_DW01_sub_9_0 10k I8051_ALU_DW01_sub_9_7 10k I8051_ALU_DW01_sub_9_1 10k

c35_CORELIB c35_CORELIB c35_CORELIB c35_CORELIB c35_CORELIB c35_CORELIB c35_CORELIB

Point Incr Path -------------------------------------------------------------------------clock clk (rise edge) 0.00 0.00 clock network delay (ideal) 0.00 0.00 U_I8051/U_CTR/alu_src_2_reg[1]/C (DFE3) 0.00 0.00 r U_I8051/U_CTR/alu_src_2_reg[1]/Q (DFE3) 1.07 1.07 r U_I8051/U_CTR/alu_src_2[1] (I8051_CTR) 0.00 1.07 r U_I8051/U_ALU/src_2[1] (I8051_ALU) 0.00 1.07 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus/B[1] (I8051_ALU_DW01_sub_9_4) 0.00 1.07 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus/U23/Q (INV10) 0.12 1.19 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus/U21/Q (NAND28) 0.28 1.47 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus/U18/Q (INV10) 0.09 1.55 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus/U15/Q (NAND28) 0.16 1.71 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus/U16/Q (NOR33) 0.32 2.04 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus/DIFF[8] (I8051_ALU_DW01_sub_9_4) 0.00 2.04 f U_I8051/U_ALU/U678/Q (CLKIN12) 0.25 2.29 r U_I8051/U_ALU/U679/Q (INV12) 0.08 2.37 f U_I8051/U_ALU/U681/Q (IMUX24) 0.44 2.81 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_274/A[1] (I8051_ALU_DW01_sub_9_3) 0.00 2.81 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_274/U43/Q (INV15) 0.24 3.05 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_274/U44/Q (NOR23) 0.15 3.20 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_274/U1_4_0_1/Q (OAI212) 0.37 3.57 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_274/U41/Q (NAND26) 0.05 3.62 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_274/U38/Q (NAND28) 0.21 3.83 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_274/U39/Q (INV6) 0.06 3.89 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_274/U65/Q (OAI2112) 0.39 4.28 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_274/U66/Q (INV12) 0.20 4.48 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_274/DIFF[8] (I8051_ALU_DW01_sub_9_3) 0.00 4.48 f U_I8051/U_ALU/U673/Q (MUX26) 0.66 5.14 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_275/A[2] (I8051_ALU_DW01_sub_9_2) 0.00 5.14 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_275/U49/Q (INV6) 0.10 5.24 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_275/U47/Q (NAND28) 0.21 5.46 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_275/*cell*31838/syn260358/Q (NAND28) 0.04 5.50 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_275/*cell*31838/syn259341/Q (NOR24) 0.19 5.69 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_275/*cell*31838/syn260389/Q (NOR24) 0.14 5.83 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_275/U32/Q (NAND26) 0.23 6.06 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_275/U30/Q (NAND24) 0.05 6.11 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_275/*cell*31838/syn260148/Q (AOI312) 0.30 6.41 r

- 92 -

U_I8051/U_ALU/DO_DIV_303/sub_219/minus_275/U56/Q (BUF15) 0.44 6.86 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_275/DIFF[8] (I8051_ALU_DW01_sub_9_2) 0.00 6.86 r U_I8051/U_ALU/U706/Q (IMUX24) 0.57 7.43 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_276/A[2] (I8051_ALU_DW01_sub_9_6) 0.00 7.43 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_276/U22/Q (CLKIN6) 0.12 7.55 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_276/U21/Q (NAND24) 0.23 7.78 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_276/U18/Q (NAND34) 0.09 7.87 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_276/U17/Q (NAND34) 0.33 8.19 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_276/U16/Q (NAND24) 0.03 8.22 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_276/*cell*30666/syn243601/Q (AOI312) 0.31 8.53 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_276/*cell*30667/Q (BUF15) 0.43 8.96 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_276/DIFF[8] (I8051_ALU_DW01_sub_9_6) 0.00 8.96 r U_I8051/U_ALU/U692/Q (INV8) 0.14 9.10 f U_I8051/U_ALU/U694/Q (IMUX24) 0.48 9.58 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_277/A[7] (I8051_ALU_DW01_sub_9_5) 0.00 9.58 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_277/U52/Q (NAND28) 0.37 9.95 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_277/U49/Q (NAND24) 0.05 9.99 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_277/*cell*32303/syn264579/Q (OAI2112) 0.42 10.42 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_277/*cell*32303/U_I8051/U_ALU/*cell*32200/C[1][2 ][0]/Q (AOI212) 0.25 10.67 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_277/U45/Q (CLKBU15) 0.51 11.17 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_277/DIFF[8] (I8051_ALU_DW01_sub_9_5) 0.00 11.17 f U_I8051/U_ALU/U701/Q (MUX24) 0.65 11.82 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_278/A[4] (I8051_ALU_DW01_sub_9_0) 0.00 11.82 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_278/U31/Q (NOR24) 0.20 12.02 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_278/U29/Q (NAND28) 0.36 12.38 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_278/U27/Q (NOR24) 0.14 12.52 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_278/U28/Q (NOR24) 0.19 12.72 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_278/U53/Q (NAND24) 0.05 12.77 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_278/U54/Q (AOI312) 0.35 13.12 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_278/U60/Q (BUF15) 0.41 13.53 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_278/DIFF[8] (I8051_ALU_DW01_sub_9_0) 0.00 13.53 r U_I8051/U_ALU/U698/Q (MUX26) 0.57 14.10 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_279/A[3] (I8051_ALU_DW01_sub_9_1) 0.00 14.10 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_279/U43/Q (CLKIN12) 0.17 14.28 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_279/U42/Q (NOR23) 0.14 14.42 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_279/U1_4_0_3/Q (OAI212) 0.40 14.82 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_279/U1_4_1_3/Q (AOI212) 0.24 15.06 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_279/U1_4_2_7/Q (OAI212) 0.36 15.42 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_279/U1_4_3_7/Q (AOI212) 0.24 15.66 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_279/U30/Q (BUF15) 0.50 16.16 f

- 93 -

U_I8051/U_ALU/DO_DIV_303/sub_219/minus_279/DIFF[8] (I8051_ALU_DW01_sub_9_1) 0.00 16.16 f U_I8051/U_ALU/U805/Q (MUX26) 0.62 16.77 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_280/A[6] (I8051_ALU_DW01_sub_9_7) 0.00 16.77 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_280/U8/Q (INV12) 0.09 16.87 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_280/U48/Q (NAND28) 0.20 17.07 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_280/U6/Q (NAND26) 0.07 17.14 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_280/U123/Q (NOR24) 0.20 17.33 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_280/U85/Q (NOR24) 0.17 17.50 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_280/U43/Q (NAND34) 0.28 17.78 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_280/U40/Q (NAND28) 0.04 17.83 f U_I8051/U_ALU/DO_DIV_303/sub_219/minus_280/U41/Q (NAND28) 0.21 18.04 r U_I8051/U_ALU/DO_DIV_303/sub_219/minus_280/DIFF[8] (I8051_ALU_DW01_sub_9_7) 0.00 18.04 r U_I8051/U_ALU/U711/Q (CLKIN12) 0.08 18.12 f U_I8051/U_ALU/U708/Q (NAND28) 0.15 18.28 r U_I8051/U_ALU/U709/Q (NAND28) 0.09 18.37 f U_I8051/U_ALU/U723/Q (INV15) 0.19 18.55 r U_I8051/U_ALU/des_1[0] (I8051_ALU) 0.00 18.55 r U_I8051/U_CTR/alu_des_1[0] (I8051_CTR) 0.00 18.55 r U_I8051/U_CTR/U23940/Q (INV15) 0.13 18.69 f U_I8051/U_CTR/U21458/Q (NAND32) 0.26 18.95 r U_I8051/U_CTR/U24652/Q (OAI212) 0.18 19.13 f U_I8051/U_CTR/U24602/Q (AOI312) 0.44 19.57 r U_I8051/U_CTR/U24624/Q (NOR24) 0.17 19.73 f U_I8051/U_CTR/U24384/Q (IMUX24) 0.31 20.04 r U_I8051/U_CTR/U24429/Q (NAND28) 0.00 20.04 f U_I8051/U_CTR/U24476/Q (MUX24) 0.34 20.39 f U_I8051/U_CTR/U21450/Q (NAND24) 0.16 20.55 r U_I8051/U_CTR/reg_cy_reg/D (DFE3) 0.00 20.55 r data arrival time 20.55 clock clk (rise edge) 20.00 20.00 clock network delay (ideal) 0.00 20.00 clock uncertainty -0.30 19.70 U_I8051/U_CTR/reg_cy_reg/C (DFE3) 0.00 19.70 r library setup time -0.15 19.55 data required time 19.55 -------------------------------------------------------------------------data required time 19.55 data arrival time -20.55 -------------------------------------------------------------------------slack (VIOLATED) -0.99

- 94 -

APPENDIX D gemma.mac
##--------------------------------------------------------##-##-- Silicon Ensemble Macro File Template ##-##--------------------------------------------------------##-- set colors to make vias visible ##-set v DRAW.SWIRE.LAYERSET "1 2 3 4 5 6" ; set v DRAW.WIRE.LAYERSET "1 2 3 4 5 6" ; set set set set v v v v DRAW.SWIRE.4.COLOR 5 ; DRAW.SWIRE.5.COLOR 6 ; DRAW.WIRE.4.COLOR 5 ; DRAW.WIRE.5.COLOR 6 ;

set v DRAW.LAYER.ORDER "4 1 2 3 5 6";; ##-- Set Off Congestion Map Drawing SET VAR DRAW.SCORE.GRAPHICS.AT OFF ; ##-- Set Design Directory ##-SET VAR DB.DESIGN.DIR "./DB" ; SET VAR VERIFY.TECHNOLOGY.MIN.FEATURESIZE 5 ; SET VAR SROUTE.VIA.SNAPMANUFACTURINGGRID TRUE ; SET VAR SROUTE.STRIPE.SNAP.RGRID "GRID" ; SET VAR HYPEREXTRACT.RULES.FILE "/app11/AMS_3.51_CDS_F/artist/HK_C35/LEF/c35b4/c35b4_he.rules"; ##-- Set Variables for VERILOG Import ##-SET VAR INPUT.VERILOG.CREATE.IO.PINS FALSE ; ## SET VAR INPUT.VERILOG.ADD.LEADING.DELIM FALSE ; set set set set set var var var var var INPUT.VERILOG.GROUND.NET "gnd! gnd3r! gnd3o!" ; INPUT.VERILOG.POWER.NET "vdd! vdd3o! vdd3r1! vdd3r2!" ; INPUT.VERILOG.SPECIAL.NETS "vdd! vdd3o! vdd3r1! vdd3r2! gnd! gnd3o! gnd3r!" ; INPUT.VERILOG.LOGIC.0.NET gnd! ; INPUT.VERILOG.LOGIC.1.NET vdd! ;

##-- Import Library Data ##-- LEF FINPUT LEF F /app11/AMS_3.51_CDS_F/artist/HK_C35/LEF/c35b4/c35b4.lef ; INPUT LEF F /app11/AMS_3.51_CDS_F/artist/HK_C35/LEF/c35b4/CORELIB.lef ; INPUT LEF F /app11/AMS_3.51_CDS_F/artist/HK_C35/LEF/c35b4/IOLIB_4M.lef ; INPUT LEF F /class2/ug2/e05699a1/memory/cadence/sram128x8.lef ; ##-- CTLF Timing ##-- GCF File INPUT CTLF INITFILE "./c35b43.3V.gcf" ; ##-- Import Design Data ##-- Verilog INPUT VERILOG FILE /app11/AMS_3.51_CDS_F/verilog/c35b4/c35_CORELIB.v LIB DesignLib ; INPUT VERILOG FILE /app11/AMS_3.51_CDS_F/verilog/c35b4/c35_CORELIB_3B.v LIB DesignLib ; INPUT VERILOG FILE /app11/AMS_3.51_CDS_F/verilog/c35b4/c35_IOLIB_4M.v LIB DesignLib ; INPUT VERILOG FILE /class2/ug2/e05699a1/memory/cadence/sram128x8.v LIB DesignLib ; #INPUT VERILOG FILE /app11/AMS_3.51_CDS_F/verilog/c35b4/c35_IOLIBV5_4M.v LIB DesignLib ; #INPUT VERILOG FILE /app11/AMS_3.51_CDS_F/verilog/c35b4/c35_IOLIB_3B_4M.v LIB DesignLib ; INPUT VERILOG FILE ./VERILOG/s8051.v LIB DesignLib REFLIB "DesignLib" DESIGN DesignLib.s8051:hdl ; ##-- Import Timing Contraints ##-- INPUT GCF FILENAME "./constr3.3V.gcf" REPORTFILE "importgcf.rpt" ; ##-- Add Additional Cells ##-- Corner and Power Cells

- 95 -

INPUT DEF F ./DEF/power_corner.def ;

##-- Define Clock Nets ##-change net 'clk' use clock ; ##-- change net 'clk' use clock ; ##-- To Set Rows On Grid SET VAR PLAN.IOROW.SNAPGRID.X 100 ; SET VAR PLAN.IOROW.SNAPGRID.Y 100 ; ##-- Save design ##-SAVE "A_NETLIST" ; FLOAD DESIGN "A_NETLIST" ; ##-- Initialize the floorplan FINIT FLOOR rowsp 0 blockhalo 20000 f abut x 2187000 y 2380800 xio 100000 yio 100000 ; SET VARIABLE DRAW.ROW.AT ON; SET VARIABLE DRAW.WIRE.AT HERE;; SET VARIABLE DRAW.SWIRE.AT ON; ##-- Place the Periphery Cells IOPLACE FILENAME "ioplace.ioc" STYLE EVEN ;

############################################################## ############# Placement of the RAM ##### ############################################################## SET VARIABLE SELECT.CELL.AT TRUE; SELECT (-839726 -441927); SELECT AREA (-853851 -394844) (-839726 -390136); MOVE CELL U_I8051/U_RAM/U_128RAM FN AT (-652400 260400) ; ############################################################## ##############################################################

##-- Cut Rows around Blocks CUT ROW BLOCKHALO 30000; SAVE "B_FPLAN" ; FLOAD DESIGN "B_FPLAN" ; ##-- Power Routing ##-BUILD CHANNEL ; ##-- Add Power Stripes ##-ADD STRIPE NET vdd! NET gnd! DIRECTION Vertical LAYER MET2 WIDTH 20000 SPACING 2000 COUNT 1 ALL ;

##-- Add Power Rings ##-CONSTRUCT RING NET "vdd!" NET "gnd!" LAYER MET1 CORERINGWIDTH 40000 SPACING 2000 BLOCKRINGWIDTH 10000 LAYER MET2 CORERINGWIDTH 40000 SPACING 2000 BLOCKRINGWIDTH 10000 ; SAVE C_PPLAN ; FLOAD DESIGN "C_PPLAN" ; ##################################################################### ##-- Add Cap cells SROUTE ADDCELL MODEL ENDCAPL PREFIX lcap SPIN vdd! NET vdd! SPIN gnd! NET gnd! AREA ( -46000000 -46000000 ) ( 46000000 46000000 ) PREENDCAP ; SROUTE ADDCELL MODEL ENDCAPR PREFIX rcap SPIN vdd! NET vdd! SPIN gnd! NET gnd! AREA ( -46000000 -46000000 ) ( 46000000 46000000 ) POSTENDCAP ; ##-- Place Standard Cells

- 96 -

QPLACE NOCONFIG ; SAVE D_QPLACE ; FLOAD DESIGN "D_QPLACE" ; #####################################################################

##-- Write a DEF File - used as input for CTGen OUT DEF F ./DEF/pre_clk.def ; ##-- Invoke CTGen CTGEN FILENAME ctgen.cmd ; ##-- Import DEF with Clock Buffers ##-- DEF ECO SET VAR ALLOW.EC TRUE ; INPUT DEF ECO FILENAME "./CTGEN/post_clk.def" REPORTFILE "qplaced.defeco.rpt" KEEPDISTCELLS KEEPDISTNETS KEEPEEQMODELS KEEPLEQMODELS NOKEEPORIGPLACE ;

##-- Save the design SAVE E_CTGEN ; FLOAD DESIGN "E_CTGEN" ; ###################################################################### ##-- Add Filler Cells ##-- Has do be done before routing !!! EXEC fillcore.mac ; ##-- Fill gaps between periphery cells EXEC fillperi.mac ;

##-- Finish Power Routing ##-- Follow Pins CONNECT RING NET "gnd!" NET "vdd!" FOLLOWPIN ; ##-- IO Rings CONNECT RING NET 'vdd3r1!' NET 'vdd3r2!' NET 'vdd3o!' NET 'gnd3r!' NET 'gnd3o!' IORING ; ##-- Connect Stripes CONNECT RING NET "vdd!" NET "gnd!" STRIPE ; ##-- Connect Blocks CONNECT RING NET "vdd!" NET "gnd!" BLOCK ALLPORT ; CONNECT RING NET "vdd!" NET "gnd!" BLOCK ALLPORT ; ##-- Connect Power Pads CONNECT RING NET "vdd!" NET "gnd!" IOPAD ALLPORT ; SAVE F_PROUTE ; FLOAD DESIGN "F_PROUTE" ; ########################################################################## ##-- Route the Clock nets CLOCKROUTE ALL ; SAVE G_CTROUTE ; FLOAD DESIGN "G_CTROUTE" ; ########################################################################## ##-- Route all the nets SET VAR WROUTE.GROUTE.ONLY FALSE ; SET VAR WROUTE.FINAL TRUE ; SET VAR WROUTE.GLOBAL TRUE ; SET VAR WROUTE.INCREMENTAL.FINAL FALSE ; WROUTE NOCONFIG ; ##-- Write Logical SDF

- 97 -

SET VAR TIMING.REPORT.LOGICAL.SDF.OUTPUT TRUE; REPORT DELAY SDFOUTPUT FILENAME s8051.sdf ; ##-- Save the design SAVE "H_s8051" ; FLOAD DESIGN "H_s8051" ; ######################################################################### ##-- Save the design as DEF OUTPUT DEF FILENAME "./DEF/s8051.def" ; ##-- Write RSPF REPORT RC FILE s8051.rspf ; ##-- Report clock skew of routed design REPORT CLOCKSKEW CMDFILE "ctgen_post.cmd" ; ##-- Write GDSII File OUTPUT GDSII MAPFILE gds2.map STRUCTURENAME s8051 FILE s8051_se.gds2 UNITS Thousands ;

- 98 -

APPENDIX E c35b43.3V.gcf
/* Owner: austriamicrosystems AG HIT-Kit: Digital */ (gcf (header (version "1.2") (TIME_SCALE 1.0E-9) (CAP_SCALE 1.0E-12) ) (globals (globals_subset environment (extension "TLF_FILES" ( /app11/AMS_3.51_CDS_F/tlf/c35b4_3.3V/c35_CORELIB.tlf /app11/AMS_3.51_CDS_F/tlf/c35b4_3.3V/c35_CORELIB_3B.tlf /app11/AMS_3.51_CDS_F/tlf/c35b4_3.3V/c35_IOLIB_4M.tlf /class2/ug2/e05699a1/memory/cadence/sram128x8_43.tlf ) ) (operating_conditions "WORST-MIL" 1.36 3 125) /* (operating_conditions "WORST-IND" 1.36 3 85) */ /* (operating_conditions "WORST" 1.36 3 75) */ /* (operating_conditions "TYPICAL" 1 3.3 25) */ /* (operating_conditions "BEST" 0.74 3.6 0) */ /* (operating_conditions "BEST-IND" 0.74 3.6 -40) */ /* (operating_conditions "BEST-MIL" 0.74 3.6 -50) */ ) ) )

- 99 -

APPENDIX F power_corner.def
DESIGN s8051 ; UNITS DISTANCE MICRONS 1000 ; COMPONENTS 6 ; CORNER1 CORNER2 CORNER3 CORNER4 GND4 PWR4 CORNERP ; CORNERP ; CORNERP ; CORNERP ; GND3ALLP ; VDD3ALLP ;

END COMPONENTS SPECIALNETS 7 ; - vdd3r1! ( CORNER1 vdd3r1! ) ( CORNER2 vdd3r1! ( CORNER3 vdd3r1! ) ( CORNER4 vdd3r1! ( PWR4 vdd3r1! ) ( GND4 vdd3r1! ) ; - vdd3r2! ( CORNER1 vdd3r2! ) ( CORNER2 vdd3r2! ( CORNER3 vdd3r2! ) ( CORNER4 vdd3r2! ( PWR4 vdd3r2! ) ( GND4 vdd3r2! ) ; - vdd3o! ( CORNER1 vdd3o! ) ( CORNER2 vdd3o! ) ( CORNER3 vdd3o! ) ( CORNER4 vdd3o! ) ( PWR4 vdd3o! ) ( GND4 vdd3o! ) ; - gnd3o! ( CORNER1 gnd3o! ) ( CORNER2 gnd3o! ) ( CORNER3 gnd3o! ) ( CORNER4 gnd3o! ) ( PWR4 gnd3o! ) ( GND4 gnd3o! ) ; - gnd3r! ( CORNER1 gnd3r! ) ( CORNER2 gnd3r! ) ( CORNER3 gnd3r! ) ( CORNER4 gnd3r! ) ( PWR4 gnd3r! ) ( GND4 gnd3r! ) ; - vdd! ( PWR4 vdd! ) + SPACING MET1 800 RANGE 10000 10000000 + SPACING MET2 800 RANGE 10000 10000000 + SPACING MET3 800 RANGE 10000 10000000 + SPACING MET4 800 RANGE 1000 1000000 ; - gnd! ( GND4 gnd! ) + SPACING MET1 800 RANGE 10000 10000000 + SPACING MET2 800 RANGE 10000 10000000 + SPACING MET3 800 RANGE 10000 10000000 + SPACING MET4 800 RANGE 1000 1000000 ; END SPECIALNETS END DESIGN ) )

) )

- 100 -

APPENDIX G ioplace.ioc
# Copyright (c) 1997 by Cadence. All rights reserved. ################################################################### # In each of TOP()/BOTTOM()/LEFT()/RIGHT() section, there are # # placed IOs. In the IGNORE() section, the IOs are ignored # # by the IOPlacer. In every section, the IO syntax could be: # # for pin: (IOPIN iopinName ); # # for pad: iopadName orientation ; # # for space: SPACE value; # # The capital words are keywords. orientation is not required. # # The value is the space between the IO above and the IO below it.# ################################################################### IOPLACEHEADER ( (VERSION 5.3 ) (DIVIDERCHAR "/" ) (BUSBITCHARS "[]" ) ) LEFT ( #IOs are ordered from bottom to top port0_0 FlipWest; #lmod:ICP net:port0[0] port0_1 FlipWest; #lmod:ICP net:port0[1] port0_2 FlipWest; #lmod:ICP net:port0[2] port0_3 FlipWest; #lmod:ICP net:port0[3] port0_4 FlipWest; #lmod:ICP net:port0[4] port0_5 FlipWest; #lmod:ICP net:port0[5] port0_6 FlipWest; #lmod:ICP net:port0[6] port0_7 FlipWest; #lmod:ICP net:port0[7] port2_0 FlipWest; #lmod:ICP net:port2[0] port2_1 FlipWest; #lmod:ICP net:port2[1] port2_2 FlipWest; #lmod:ICP net:port2[2] port2_3 FlipWest; #lmod:ICP net:port2[3] ) RIGHT ( #IOs are ordered from bottom to top port1_7 West; #lmod:BBCU4P net:port1[7] port1_6 West; #lmod:BBCU4P net:port1[6] port1_5 West; #lmod:BBCU4P net:port1[5] port1_4 West; #lmod:BBCU4P net:port1[4] addr0 West; #lmod:BU2P net:addr[0] addr1 West; #lmod:BU2P net:addr[1] addr2 West; #lmod:BU2P net:addr[2] addr3 West; #lmod:BU2P net:addr[3] addr4 West; #lmod:BU2P net:addr[4] addr5 West; #lmod:BU2P net:addr[5] addr6 West; #lmod:BU2P net:addr[6] addr7 West; #lmod:BU2P net:addr[7] addr8 West; #lmod:BU2P net:addr[8] addr9 West; #lmod:BU2P net:addr[9] addr10 West; #lmod:BU2P net:addr[10] addr11 West; #lmod:BU2P net:addr[11] addr12 West; #lmod:BU2P net:addr[12] ) TOP ( #IOs are ordered from left to right PWR4 FlipSouth; #lmod:VDD3ALLP net:vdd! rd_pad FlipSouth; #lmod:BU2P net:rd data7 FlipSouth; #lmod:BBC4P net:data[7] data6 FlipSouth; #lmod:BBC4P net:data[6] data5 FlipSouth; #lmod:BBC4P net:data[5] data4 FlipSouth; #lmod:BBC4P net:data[4] data3 FlipSouth; #lmod:BBC4P net:data[3] data2 FlipSouth; #lmod:BBC4P net:data[2] data1 FlipSouth; #lmod:BBC4P net:data[1] data0 FlipSouth; #lmod:BBC4P net:data[0] wr_pad FlipSouth; #lmod:BU2P net:wr addr15 FlipSouth; #lmod:BU2P net:addr[15] addr14 FlipSouth; #lmod:BU2P net:addr[14] addr13 FlipSouth; #lmod:BU2P net:addr[13] ) BOTTOM ( #IOs are ordered from left to right rst_pad North; #lmod:ICUP net:rst

- 101 -

sd_pad North; #lmod:ICP net:sd port3_5 North; #lmod:BBC4P net:port3[5] port3_4 North; #lmod:BBC4P net:port3[4] port3_3 North; #lmod:BBC4P net:port3[3] port3_2 North; #lmod:BBC4P net:port3[2] port3_1 North; #lmod:BBC4P net:port3[1] port3_0 North; #lmod:BBC4P net:port3[0] port1_3 North; #lmod:BBCU4P net:port1[3] port1_2 North; #lmod:BBCU4P net:port1[2] port1_1 North; #lmod:BBCU4P net:port1[1] port1_0 North; #lmod:BBCU4P net:port1[0] clk_pad North; #lmod:ICCK2P net:clk GND4 North; #lmod:GND3ALLP net:gnd! ) IGNORE ( #IOs are ignored(not placed) by IO Placer )

- 102 -

APPENDIX H PCB Layout

- 103 -

PCB Layout (top)

- 104 -

PCB Layout (bottom)

- 105 -

APPENDIX I loader.c
#pragma NOGCSE #pragma NOINDUCTION #pragma NOLOOPREVERSE #include <my8051.h> // SS : P1_3 // MOSI : P1_2 // SCK : P1_1 // MISO : P1_0 #define SS P1_3 unsigned char spi_bus; unsigned char spi_buff; //unsigned char SDCMD[6]; bit waiting; //unsigned int delay; xdata at 0x0000 unsigned char xdata at 0x4000 unsigned char xdata at 0x4200 unsigned long xdata at 0x4204 unsigned int xdata at 0x4206 unsigned int xdata at 0x4208 unsigned long xdata at 0x420C unsigned char xdata at 0x4214 unsigned char xdata at 0x4215 unsigned int xdata at 0x4217 unsigned int xdata at 0x4219 unsigned int xdata at 0x4419 unsigned int xdata at 0x441B unsigned char

xram[16384]; sd_buff[512]; rel_sec; reserve_sec; sec_per_FAT; num_of_sec; sys_ID[8]; sec_per_clust; cur_FAT_sec_no = 0xFF; cur_cluster; FAT[256]; max_root_entries; file_entry[32];

void timer0_interrupt_handler(void) interrupt 1 { ET0 = 0; P1 = spi_bus; waiting = 0; ET0 = 1; } // SPI bus's functions void run_spi() { // shift 8 bits through spi_buff unsigned char i; TR0 = 1; for (i = 0; i<8; i++) { if (spi_buff & 0x80) spi_bus = 0xF5; else spi_bus = 0xF1; if (SS) spi_bus = spi_bus | 0x08; waiting = 1; while (waiting); spi_bus = spi_bus | 0x02; waiting = 1; while (waiting);

// 1111sd01

// toggle SCK, set SCK to 1

spi_buff = (spi_buff << 1) | ((unsigned char) P1_0); } spi_bus = spi_bus & 0xFD; // toggle SCK, set SCK to 0 waiting = 1; while (waiting); TR0 = 0; } void sdInit(void) { // init SD card unsigned char i, triesout, SDCMD[6]; unsigned int delay;

- 106 -

TR0 = 1; spi_bus = P1 = 0xFD; // SS:1,MOSI:1,SCK:0,MISO:1 for (delay=0; delay<15170;delay++) { // delay half a second waiting = 1; while (waiting); } TR0 = 0; for (i=0; i<8; i++) { spi_buff = 0xFF; run_spi(); } spi_bus = 0XF5; P1 = 0xF5; // SS:0, MOSI:1, SCK:0, MISO:1 for (i=0; i<8; i++) { spi_buff = 0xFF; run_spi(); } SDCMD[0] = 0x40; // cmd 0, reset SD card for (i = 1; i <= 4; i++) SDCMD[i] = 0; SDCMD[5] = 0x95; // CRC for cmd 0 spi_buff = 0xFF; run_spi(); for (i=0; i<6; i++) { spi_buff = SDCMD[i]; run_spi(); } spi_buff = 0xFF; for (i = 0; i < 8 && spi_buff == 0xFF; i++) run_spi();

if (spi_buff != 0xFF) { triesout = 8; while (spi_buff && triesout--) { SDCMD[0] = 0x41; // cmd 1, init card for (i = 1; i <= 5; i++) SDCMD[i] = 0; spi_buff = 0xFF; run_spi(); for (i=0; i<6; i++) { spi_buff = SDCMD[i]; run_spi(); } spi_buff = 0xFF; for (i = 0; i < 8 && spi_buff == 0xFF; i++) run_spi(); } } } void sdRdBlk(unsigned long addr) { // read single block of 512 bytes data // into sd_buff starting from addr unsigned long ad = addr; unsigned int i; unsigned char SDCMD[6]; SDCMD[0] = 0x51; // cmd 17, read single block for (i = 4; i > 0; i--) { SDCMD[i] = ad & 0xFF; ad = ad >> 8; } SDCMD[5] = 0; spi_buff = 0xFF; run_spi(); for (i=0; i<6; i++) { spi_buff = SDCMD[i]; run_spi(); } spi_buff = 0xFF; for (i = 0; i < 8 && spi_buff == 0xFF; i++) run_spi();

- 107 -

if (!spi_buff) { // R1 response == 0, no error spi_buff = 0xFF; for (i = 0; i < 8 && spi_buff == 0xFF; i++) run_spi(); if (spi_buff == 0xFE) { // data token == 0xFE for (i=0; i<512; i++) { // loop 512 times spi_buff = 0xFF; run_spi(); sd_buff[i] = spi_buff; } } } spi_buff = 0xFF; run_spi(); spi_buff = 0xFF; run_spi(); } void getBootRecord(void) { unsigned char i; // read MBR from 0x0000 and extract the relative sectors sdRdBlk(0); rel_sec = ((unsigned long) sd_buff[0x01C9]) << 24 | ((unsigned long) sd_buff[0x01C8]) << 16 | ((unsigned int ) sd_buff[0x01C7]) << 8 | sd_buff[0x01C6];

sdRdBlk(rel_sec << 9); sec_per_clust reserve_sec = sd_buff[0x0D]; = ((unsigned int ) sd_buff[0x0F]) << sd_buff[0x0E]; = ((unsigned int ) sd_buff[0x12]) << sd_buff[0x11]; = ((unsigned int ) sd_buff[0x17]) << sd_buff[0x16]; 8 |

max_root_entries

8 |

sec_per_FAT

8 |

num_of_sec

= ((unsigned long) sd_buff[0x23]) << 24 | ((unsigned long) sd_buff[0x22]) << 16 | ((unsigned int ) sd_buff[0x21]) << 8 | sd_buff[0x20];

for (i=0; i<8; i++) sys_ID[i] = sd_buff[0x36 + i]; } unsigned int FATentry(unsigned int clust) { // entries/sector = 256 // cluster number in sector = cluster % 256 = cluster & 0x0F unsigned int FAT_sec_no; unsigned int i; FAT_sec_no = (clust >> 8); if (FAT_sec_no != cur_FAT_sec_no) { sdRdBlk((rel_sec + reserve_sec + FAT_sec_no) << 9); for (i=0; i<256; i++) FAT[i] = ( ( (unsigned int) sd_buff[i<<1]) | (((unsigned int) sd_buff[(i<<1)|0x01] )<<8) cur_FAT_sec_no = FAT_sec_no; } return FAT[clust & 0x00FF]; } void searchFile(unsigned char filename[11]) { unsigned char root_dir_sec_no, i; unsigned int root_dir_entry_no; root_dir_sec_no = 0x00; sdRdBlk((rel_sec + reserve_sec + (sec_per_FAT << 1)) << 9);

);

- 108 -

file_entry[26] = 0; file_entry[27] = 0; for (root_dir_entry_no=0; root_dir_entry_no<max_root_entries; root_dir_entry_no++) { // search through each root directory entries for file if ((root_dir_entry_no >> 4) != root_dir_sec_no) { root_dir_sec_no = (root_dir_entry_no >> 4); sdRdBlk((rel_sec + reserve_sec + (sec_per_FAT << 1) + root_dir_sec_no) << 9); } if ((sd_buff[((root_dir_entry_no & 0x0F) << 5) + 11] & 0x18) == 0x00) { // check if current root directory entry is a file (not directory or volumn) for (i = 0; i<11; i++) { // check if filename match character by character if (filename[i] != sd_buff[((root_dir_entry_no & 0x0F) << 5) + i]) break; // file doesn't match } if (i == 11) { // file found for (i = 0; i < 32; i++) file_entry[i] = sd_buff[((root_dir_entry_no & 0x0F) << 5) + i]; break; } } } } void loadFile(void) { unsigned long file_size; unsigned int i, j, addr; addr = 0; file_size = ((unsigned long) file_entry[31]) << 24 | ((unsigned long) file_entry[30]) << 16 | ((unsigned int ) file_entry[29]) << 8 | file_entry[28]; cur_cluster = ((unsigned int)file_entry[27]<<8) | file_entry[26]; while (FATentry(cur_cluster) < 0xFFF0) { for (j=0; j<sec_per_clust; j++) { sdRdBlk(((rel_sec + reserve_sec + (sec_per_FAT<<1) + j + (cur_cluster-2)*sec_per_clust)<<9) + (max_root_entries<<5)); for (i=0; i<512; i++) xram[addr++] = sd_buff[i]; } cur_cluster = FATentry(cur_cluster); } for (j=0; j<(file_size%(512*sec_per_clust))>>9; j++) { sdRdBlk(((rel_sec + reserve_sec + (sec_per_FAT<<1) + j + (cur_cluster-2)*sec_per_clust)<<9) + (max_root_entries<<5)); for (i=0; i<512; i++) xram[addr++] = sd_buff[i]; } sdRdBlk(((rel_sec + reserve_sec + (sec_per_FAT<<1) + j + (cur_cluster-2)*sec_per_clust)<<9) + (max_root_entries<<5)); for (i=0; i<file_size%512; i++) xram[addr++] = sd_buff[i]; } void main() { // 11110001 = 241 = 0xF1 TRIS1 = 0xF1; SS = 1; // SMOD | - | - | - | GF1 | GF0 | PD | IDL PCON = 0; // set SMOD = 0 // EA | - | - | ES | ET1 | EX1 | ET0 | EX0 IE = 0x82; // enable timer0 interrupt

- 109 -

TMOD = 34; TH0 = 175; sdInit();

// configure timer0 and timer1 to mode 2, auto reload // SPI clock frequency = 19.6608MHz/(256-175)/8/2=15.17kHz

getBootRecord(); searchFile("PROGRAM BIN"); if ((file_entry[26] | file_entry[27]) != 0) { // if file exist loadFile(); PCON = 64; } else while(1); }

- 110 -