Sie sind auf Seite 1von 6

986

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 29, NO. 8, AUGUST 1994

An 8-bit Multitask Micropower RISC Core


J.-F. Perotto, C. Lamothe, C. Arm, C. Piguet, E. Dijkstra, S. Fink, E. Sanchez, J.-P. Wattenhofer, Member, IEEE, and M. Cecchini

Abstract-This paper describes a multitask micropower RISC core. A hardware scheduler handles up to four separate tasks in a pseudo-parallel way. Task or context switching is performed at the instruction level and does not need additional instructions. In a 1.5-V low-power 2-pm technology the core area is 5-6 mm2, depending upon the global routing of the complete ASIC. Measured power consumption is 0.2pAkHz at 1.5 V with a low-power 8-K word ROM and a 256-byte RAM.

I. INTRODUCTION
ICROPOWER microprocessors are important building blocks in watches, portable battery operated instruments, mobile communications, medical devices and computer peripherals. Such applications increasingly require processors which can operate on low supply voltages. A minimum power consumption has to be reached while maintaining a reasonable speed. This paper presents a microprocessor that has been designed for micropower applications. A characteristic of the kind of applications that are being pursued is that functions are, in general, not continuously executed. Therefore the processor is event driven, i.e., a task is started by an external event. Upon completion of the task, the processor goes in sleep-mode and awaits for the next task. The implemented microprocessor is a multi-task architecture, i.e., up to four separate tasks can be handled by a hardware scheduler in a pseudo-parallel way [ 11. Context switching does not need additional instructions. Therefore the number of executed instructions within an application is decreased. The basic architecture is an 8-bit RISC processor executing a one-word instruction per cycle of 4 clock periods. Due to the reduced number of clock periods that are necessary to execute a task, one contributes to a power consumption reduction. It is interesting to notice the similarity between high-speed RISC processors [2], [3] and micropower RISC processors [4], [5]. Both minimize the number of clock periods per task. For battery operated systems, a supply voltage of 1.5 V is frequently selected. The speed-loss at very low voltages is generally compensated for by an extended parallelization [6].
Manuscript received December 13, 1993; revised March 2, 1994. The design of the micropower microprocessor has been largely supported by the Centre Electronique Horloger S. A. (CEH), Switzerland and the Commission pour IEncouragement des Recherches Scientifiques (CERS), Switzerland. The name of this microprocessor is PUNCH. The chip integration has been carried out by EM Microelectronic, Marin, Switzerland. J.-F. Perotto, C. Lamothe, C. Arm, C. Piguet, and E. Dijkstra are with CSEM Centre Suisse dElectronique et de Microtechnique SA, 2000 Neuchbtel, Switzerland. S. Fink and E. Sanchez are with Logic System Laboratory, Ecole Polytechnique FCderale, Lausanne, Switzerland. J.-P. Wattenhofer and M. Cecchini are with ASULAB SA, Neuchltel, Switzerland. IEEE Log Number 9402 150.

Fig. 1. Parallel tasks and reactivity.

However, in microprocessor-like architectures, this approach requires several processors connected in parallel. In our case we did not compensate for the speed-loss but we focused rather on a minimum chip-area. At the project start, the goal was a microprocessor core of about 10,000 transistors.

11. PRINCIPLE OF THE MULTITASK ARCHITECTURE Many control problems can be decomposed into several tasks. Fig. 1 shows, for instance, a watch application containing four tasks, i.e., the motor control, the time update, a chronograph and the crown and modes management. In this kind of application, the task reactivity should be relatively high. In the example of the watch this means that the time update has to be performed even with other tasks active. Fig. 2 shows how the reactivity problem is solved in a conventional microprocessor. Once a task is started it is not continuously executed. The task is split into several pieces of code at the end of which a software scheduler is testing whether another task has to be executed or whether the current task can be resumed. If a long task is running, it has to be interrupted many times in order to carry out a continuous polling of the events. The result is a complex software scheduler. An analysis of existing application programs for watches has shown that about 20% of the instructions of the complete program were used to manage the tasks, i.e., to implement the software scheduler. Fig. 3 shows the basic principle of the multitask processor. After each instruction executed by the microprocessor, a hardware scheduler selects another task in the event that it is active (task switching). For real time or reactive applications any external event that has to be taken into account immediately starts the task without any delay. However, if other tasks are running, the task is executed more slowly (Fig. 4). If the principle of multitasking is not new, the originality of the

0018-9200/94$04,00 0 1994 IEEE

PEROTTO et al.: AN 8-BIT MULTITASK MICROPOWER RISC CORE

987

Parallel Tasks in a Watch Application


Task Task Task

Delayed starting tasks in a single proLessor

Task parallelism and reactivity are performed by software : - each task is segmented
TIME
MODES
MOTOR

Pnnciple of the MultiTask Architecture

0 0 0

0 0 0

0 0 0

Task instrucuons contlnuously executed Instructions of 2 tasks altematively executed I I I I D Same scheme lhan above for 3 tasks t Startlng Task Fig. 4. Multitask principle.

- the execution sequence of each

segment is performed by software


TIME MODES
MOTOR

4-1

? : : . : . :. . . . _..... : . : ............

m
Fig. 2. Software scheduler. Fig. 5.

r 3
Event Bank

ExtEvents
PUNCH architecture principle

78

Microprocessor
Fig. 3 . Multitask principle

presented architecture is that the task switching is realized after each instruction and not after an instruction packet, as it is done in computers. Due to the hardware scheduler, it is possible to reduce the number of executed instructions for a given application. Provided that the hardware scheduler can be built with a neglectable amount of hardware, this way of scheduling will contribute to a reduced power consumption. Comparison with a parallel architecture with 4 processors has been performed. The parallel architecture has a better throughput of about a factor 4, but contains 4 times the number of transistors and presents a 4 times increased power consumption at the same frequency and same V d d . In generalpurpose microprocessors, a task can immediately be started by an interrupt. However, interrupts need extra steps to call

the interrupt routine, to save and to restore the context. The multitask mechanism is therefore a faster and a less powerconsuming way to start immediately a given task. Furthermore, an interrupt routine is generally a short routine that handles an exception. If several tasks present the same priority level and have to be started immediately upon request, they cannot be managed by interrupts. A multitask architecture is better suited then.
111. ARCHITECTURE OF THE MULTITASK MICROPROCESSOR

Fig. 5 shows the basic architecture of the multitask microprocessor. It can handle up to four tasks. The ROM memory is therefore sequentially addressed by one of the four program counters (one for each task). The hardware scheduler selects the program counter, accumulator, and index register. Each program counter is able to address the full ROM address space of up to 8 K instructions. It is therefore possible to store tasks of any length in the ROM. The 8-bit datapath contains an ALU and a set of 2.56 working registers. Four accumulators and four index registers are necessary for the task-switching that occurs after each instruction execution. Up to 2.56 peripherals can

988

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 29, NO. 8, AUGUST 1994

process 0

process 1

process 2

process 3

One task and 3 subroutine levels

.I

2 tasks and 2 subroutine levels shared between tasks.

.-. .eventrouterI ..
I

soft events
.

soft events

soft events

3 tasks and 1 subroutine level for task 1 or task 3

- - ext event. - -extevenl


&enable
2 3

e l Ent

---4 5

Ent

Event

4 tasks.

event pup0

event group1

event group2

event group3

Fig. 6. Task configurations. Fig. 7.

Scheduler and event routing.

be addressed. The basic instruction set of the microprocessor contains nine generic 18-bit instructions resulting in 103 assembly instructions. All the working registers, peripherals, control registers, and flags are mapped in the same register file. All the tasks can access the working registers implemented in a shared RAM memory. A wait instruction based on semaphores has to be used to access the working registers [7]. A task can be started with external events that are provided by the peripherals. Other events called software events may be generated internally by a task. Each task can be started by one of two external and four software event sources. A task is completed with a halt instruction. As shown in Fig. 6, the microprocessor can be configured in different ways depending upon the applications. The programmer has to choose to decompose his problem in one, two, three, or four pseudo-parallel tasks. If less than four tasks are defined, the resources of the unused tasks (i.e., the program counters, accumulators and index registers) can be re-employed for the implementation of subroutine levels; e.g., if only one task is defined, three subroutine levels are possible while using the three remaining program counters as subroutine pointers. In this case, the microprocessor works as a conventional monoprocessor. Other configurations can be chosen with two tasks and two subroutine levels or three tasks and one subroutine level (Fig. 6). Subroutines can only be defined in task 0 or in task 3 in order to keep such a mechanism

as simple as possible. The choice of a configuration is software programmable. Fig. 6 shows the seven possible configurations that can be chosen by the programmer.
IV. HARDWARE SCHEDULER The hardware scheduler is able to manage up to four tasks. It is based on a token circulation enabling one active task after another (Fig. 7). A task is active if its corresponding Reqi signal is set. If the task is not active, the token is immediately shifted to the next active task. A task is started by activating its Reqi signal. A task can be disabled by software that controls the PEi switches (Fig. 7). The Reqi signals can be set or reset by software or are driven by the events bank through an event router. To execute a task with an absolute priority, it is possible to freeze the scheduler in order to execute continuously this task. This freeze state is software programmable with the control of the PE1 switches (Fig. 7). A register Process Control of the register file is used to control the four Reqi flip-flops and the four PEi switches. Depending upon the configurations it can be useful to modify the event routing; for instance, if only one task is defined, the programmer can route all the events on this single task. If two tasks are defined, 112 or 314 of the events can be routed on one task and 112, respectively 114 on the other task. Fig. 8 shows a possible routing that can be

PEROTTO et al.: AN %BIT MULTITASK MICROPOWER RISC CORE

989

process sub 0 level

sub

level

process 3

jump !?...... ........... 15 14 12 ::::my@ qn


. . ..k.

0
, , I !

branch address

jump on (p, cond)


call
0

, , , ,call address
Fig. 8. Example of event routing.

return
process 2
(sub

process 0

process 1
(sub

process 3
(sub

wait if bit(f)=l then wait else execute next instruction in the same process
scheduler

loadi
process 3

process 0 (sub

process 1 (sub

process 2 (sub

..... ....

17 16
I t

9 8

::o, data, , f := data dataop

, ,

faddr , ,

scheduler Fig. 9. Stack pointers.

defined, depending on the selected configuration. This choice is software programmable by the mode register (Figs. 7 and 11). The microprocessor contains a call and a return instruction to call sub-routines (Fig. 10). As stated above, sub-routines can be used if the selected number of tasks is less than four. The dedicated resources which are attached to unused tasks (program counter, accumulator, index register) can be used for a sub-routine level. If, for instance, the microprocessor is configured for one task, three levels of sub-routines can be called from task 0 using the left stack pointer SPL (Fig. 9). The right stack pointer SPR can be used from task 3. The SPL and SPR stack pointer are incremented (call) or decremented (return) to select the next program counter while the current address of the calling program is memorized in the corresponding program counter.

type=010 : f := f <op> type=011 : f := accu <op> type=100 : accu := f cop> accu type=l01 : ix := f cop> ix type=l 10 : f := f cop> accu type=l 1 1 : f := accu cop> ix
dataopi

r := data <op> r
bi top

(r = accu I ix)

type* types1 type=lO type=ll


Fig. 10.

: bit(f) := 1 : bit(f) := Tbit(f) : z:=Tbit(f) : see wait instruction

Generic instruction set.

V. INSTRUCTIONSET
The instruction set of the microprocessor is a RISC set. The number of instructions has been kept as small as possible. Nine generic instructions have been defined, resulting in 103 assembly mnemonics that are recognized by the assembler tool. However, the most important RISC characteristic is the instruction format. All the instructions of the set are implemented with an 18-bit word. Furthermore, each instruction is executed in one cycle of four phases (only the table instruction is executed in two cycles). The RISC feature of one execution cycle per instruction contributes largely to a minimum power consumption. Fig. 10 shows the nine 18-bit generic instructions. The branch address field is 13 bits, and the jump instruction can test eight conditions (carry, not carry, zero, not zero, unconditional

990

IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 29, NO. 8, AUGUST 1994

P
~

I
256Working

file addr bits 8 7 6 5 4 3 2 10 0 RAMaddr 1 0 Direct Periph 1 1 0 IndexedRAM 1 1 1 0 IndexedPeriph 1 1 1 1 CWlRe~isters

:
,
255 256
I

Registers (RAM)

I
38;
38:

128Direct Access Peripherals Relative Indexec

Instruction Address Field f coding

Ram Access 447 448

-+%

481 482

483 484
485

486 487

Fig. 12. Chip microphotograph.

488 489

490 49 1 492 493

si 1
*) IOType et IOProcess are Read Only

used for test and emulation

Fig. 11.

File mapping.

branch, . . . ). If the address is 0, the indexed branch address is computed from the concatenation of the accumulator content (7 MSB bits) and of the index register content (6 LSB bits). Data tables stored in the ROM memory can be accessed by the table instruction. The call and return instructions can be used if the number of tasks is less than four. If four tasks are defined, no sub-routine calls can be used. This limitation has been introduced to reduce the microprocessor core area. Nevertheless, an arbitrary number of sub-routines can be defined with a software call-return mechanism. The assembler tool developed for this microprocessor is able to recognize predefined macro-instructions callS and returnS. Each of these macro-instructions is implemented with three assembly instructions. The wait instruction is used for a test and set operation, i.e., to test a boolean that protects a shared resource (the RAM memory). If it is free, the next instruction must set (bitop instruction) this boolean to the busy state. If it is busy, one has to wait until it is free while the wait instruction is continuously executed to test the boolean. The boolean can be any bit in a working register. ALU operations are addition, addition with carry, subtraction, subtraction with borrow, comparison, logic OR, AND, XOR, and NOT as well as incrementation and decrementation. Any bit of the file register can be affected by the bitop instruction. VI. REGISTERFILE Fig. 11 shows the register file of the microprocessor. All the working registers can be accessed by the address field

(f addr) of the loadi, dataop and bitop instructions (Fig. 10). One can address up to 256 working registers with a direct or an indexed addressing mode. The working registers can be addressed with a 0-63 displacement that is added to the index register. The microprocessor can address 256 peripherals with relative indexed addressing. In addition, 128 peripherals can be addressed with the direct mode. The events (Fig. 7) are available in four control words EventGroupi, one for each task. The PE1 and Reqi signals are also available in a control register. The Reqi information defines which tasks are active. They are set by events and they can be reset by software to stop a task (halt instruction). The accumulator and the index register of the current task are mapped into the register file. The control register SysWord is used to define the event routing mode (Fig. 7). The task number of the active task can be read in the SysWord2 as well as the stack pointer values. The SysWord3 contains some control bits, for instance to reset the processor (reset) or to freeze the scheduler in its current task (hold). The next registers contain are useful for the test of the microprocessor.

MICROPROCESSOR VII. PERFORMANCES OF THE MULTITASK


The multitask microprocessor core contains 10,878 MOS. A careful examination has shown that the area cost of the multitask mechanism is not larger than the cost of interrupt systems in monotask microprocessors. With an 8 K* 18-bit ROM and 256 8-bit working registers, the system contains about 165,000 M O S . The core has been designed with a low-power library [8], [9] and routed on the Compass tools. All the different technologies provided by the library can be chosen (1.O-pIn or 2.0-pm process). In a 1.5-V lowpower 2-pm technology, the core area is about 5-6 m2. The microprocessor core has been integrated in a 1.5 V and in a 5.0-V 2-pm process. The integrated device worked the first time. Fig. 12 shows a microphotograph of a test chip with the microprocessor core realized in standard cells, a

PEROTTO et al.: AN 8-BIT MULTITASK MICROPOWER RISC CORE

99 I

ROM memory, and working registers implemented as a RAM memory. Measured power consumption is 0.2 pA/kHz at 1.5 V with a low-power 8-K word ROM and a 256-byte RAM. For watch applications at 32 kHz, the power consumption is about 10 pW in the running state. However, the circuit is more than 95% of the time in the sleeping mode. Therefore the resulting power consumption is less than 1pW. At 3 V, the microprocessor can be easily operated at 1 MHz to execute 250 K instructions/second or 0.25 MIPS (a cycle represents 4 clock periods). The power consumption is 0.6 mW when continuously running, corresponding to 400 MIPS/%. Compared to other p P [4], power consumption is reduced through a more compact multitask programmation that results in less-executed instructions. A combination of techniques [lo] have been used in order to achieve these low-power characteristics. At the architectural level, both the RISC-like architecture (i.e., one instruction executed in one cycle) and the multitask architecture result in less-executed instructions and cycles for a given task. At the logic level, low-power memories as well as a low-power cell library [SI, [9] have been used. Experience has shown that such a library provides, typically, a power reduction of a factor 3 over conventional libraries. The development tools of the microprocessor include an assembler, a high-level language compiler (Pascal-like language) and a real time emulator running on Macintosh I1 computers.

VIII. CONCLUSION The presented multitask microprocessor has been used for watches but also as general purpose core for industrial control. It provides significant advantages for a structured programming of pseudo-parallel tasks in micropower applications. REFERENCES
J.-F. Perotto et al., Multitask micropower CMOS microprocessor, ESSCIRC93, Sept. 21-23, 1993, Sevilla, Spain. D. A. Patterson, C. H. Stquin, A VLSI RISC, IEEE Compur., Sept. 1982, pp. 8-21. D. Tabak, RISC architecture, RPS Research Studies Press, Wiley, 1987. M . Ansorge et al., Design methodology for low-power full custom RISC microprocessor, Proc. EUROMICRO 86, North-Holland, 1986, pp. 427434. C. Piguet, Binary-decision and RISC-like machines for semicustom design, Microprocessors andMicrosystems, vol. 14, no. 4, pp. 23 1-240, May 1990. A. P. Chandrakasan, S. Sheng, R. W. Brodersen, Low-power CMOS digital design. IEEE J. Solid-State Circuits, vol. 27, no. 4, pp. 473484, Apr. 1992. P. Brinch Hansen, Operaring System Principles, Englewood Cliffs, N J : Prentice Hall Inc., 1973. J.-M. Masgonty et al., Technology- and power-supply-independent cell library, IEEE CICCBI, May 12-15, 1991, San Diego, CA, Conf. 25.5. J.-M. Masgonty et al., Branch-based digital cell libraries, EUROASIC91, 1991, Palais des Congrks, Paris, France, May 27-31. C. Piguet er al., Basic design techniques for both low-power and highspeed ASICs, EUROASIC92, June 2-4, 1992, Pans, France, pp. 220-225. V. von Kaenel et al., Bilan de consommation de montres Clectroniques, CEC92, 4e Congrks EuropCen de ChronomCtrie. Lausanne, Switzerland, Oct. 29-30, 1992.

Das könnte Ihnen auch gefallen