Module 3 Part 1

THE ILLIAC-IV
1. The Iliac-IV system was developed at the University of Illinois in the 1960s.
2. The system was fabricated by the Burroughs Corporation in 1972.
3. The original objective was to develop a highly parallel computer with a large
number of arithmetic units to perform vector/matrix computations at the rate of
109 operations per second.
4. The system was to employ 256 PEs under the supervision of four CUs.
5. Due to cost escalation and schedule delays , the system was limited to one
quadrant with 64 PEs under control of one CU with speed approximately 200
million operations per second.
6. The Illiac-IV computer has been applied in numerical weather forecasting and in
nuclear engineering research, among many other scientific applications.
The Illiac-IV System Architecture
1. The PEs are numbered from 0 to 63.

2. The data flow through the Illiac-IV array includes the CU bus for sending
instructions or data in blocks of eight words from the PEMs to the CU.
3. Data is represented in either 64 or 32 bit floating-point, 64-bit logical, 48 or 24bit
fixed point, or 8-bit character mode.
4. By utilizing these data formats ,the PEs can hold vectors of operands with
64,128,512 components.
5. The OS system supervises the execution of instructions fetched from the PEMs.
Common data bus

It is used to broadcast information from the CU to the entire array of 64 PEs.
For example:
A constant multiplier need not be stored 64 times in each PEM; instead, the
constant can be stored in a CU register and then broadcast to each enabled PE.
Routing network
1. Special routing instructions are used to send information from one PE register to
another PE register via the routing network.
2. A Special software figures out the shortest path in each data-routing operation.
Load or store instructions
Both instructions are used to transfer information from PE registers to PEM.
Mode-bit line
1. This line consists of one line coming from the A register of each PE in the array.
2. These lines can transmit the mode bits of the D register in the array to the
acumulator register in the CU.
B6500 host computer & I/O subsystem
1. The IIIiac-IV communicates with the outside world through an I/O subsystem , a
disk files system, and a B6500 host computer which supervises a large laser
memory(1012 bits) and the ARPA network link.
2. The B6500 manages all programmer requests for system resources. The operating
system, including compilers, assemblers, and I/O service routines, are residing in
the B6500.
3. As a total system, the IIIiac-IV array is really a special purpose back-end machine
of the B6500.
4. The ARPA(advanced research project agency) net linkage makes the IIIiac-IV
available to all members of the ARPA network.
Disk file system

The disk has 128 heads, one per track, with a 40-ms rotation speed and an
effective transfer rate of 109 bits per second.
Major components in a PE include:

1. Four 64-bit registers: A is an accumulator, B is the operand register, R is the data-
routing register, and S is a general-storage register.
2. An adder/multiplier, a logic unit, and a barrel switch for arithmetic, boolean, and
shifting functions, respectively.
3. A 16-bit index register and an adder for memory address modification and
control.
4. An 8-bit mode register to hold the results of tests and the PE masking information
Control unit (CU)
The control unit (CU) of the IIIiac-IV array performs the following functions needed
for the execution of programs:
1. Controls and decodes the instruction streams.

2. Transmits control signals to PEs for vector execution.
3. Broadcasts memory addresses that are common to all PEs.
4. Manipulates data words common to the calculations in all PEs.
5. Receives and processes trap or interrupt signals.
6. The CU by itself is a scalar a processor, in addition to its capability of

concurrently controlling the PE-array operations.
7. The CU arithmetic unit performs 64-bit scalar addition, subtraction, shift and
logic operations.
Instruction buffer (PLA) & local data buffer(LDB)
1. The instruction buffer and local data buffer are 64-word fast-access buffers.
2. The PLA is associatively addressed to hold current and pending instructions.
3. The LDB is a data cache with 64 bits per word.
4. It can hold 128 instructions, sufficient to hold the inner loop of many programs.
Accumulator registers There are four accumulator registers (ACAR).
Address adder Address arithmetic is performed by the 24-bit address adder.
Final queueIt is used to stack the addresses and data waiting to be transmitted to the PEs.
ADV AST( Advanced instruction station)
1. PE instructions are decoded by the advanced instruction station (ADV AST) and
then transmitted via control signals to all PEs.
2. In fact, the ADV AST decodes all instructions and executes the CU instructions.
3. The ADV AST constructs the necessary address and data operands after decoding
a PE instruction.
Routing path
1. Each PE has a 64 bit wide routing path to four neighbors.

2.To minimize the physical routing distance, the PEs are grouped.
3. Routing by a distance of plus or minus eight occurs interior to each group of eight
PEs.
Applications of the Illiac-IV
1. The Illiac-IV was primarily designed for matrix manipulation and differential
equations.
2. Many ARPA net users attempt to use the IIliac-IV for their own applications.
3. The main difficulties in programming the Il1iac-IV are the exploitation of
identical arithmetic computations in user programs and the proper distribution of data sets
in the PEMs to allow parallel accesses.
In this section, we examine several programming problems of the Illiac-IV.
In a conventional serial computer, the addition of two arrays (vectors) is realized

by the following Fortran statements:
DO 1001=1 ,N
100 A(I)=B(I)+C(I)
The IlIiac- IV can perform the additions in the loop simultaneously by involving
all 64 PEs in synchronous lock-step fashion. The data must be allocated in the
PEMs to support parallelism in the PEs..
Example 6.1
Case 1: N = 64 (The array matches the problem size)
The 64 components of the A, B, and C arrays are allocated in memory locations

α, α + 1, and α + 2 of the PEMs, respectively.
The machine instruction are:
LDA α + 2 (Load the accumulators of all PEs with the C array).

ADRN α + 1 (Add to the accumulators the contents of the B array)
STA α (Store the result in the accumulators to the PEMs)
Note that all the 64 loads in LDA, the 64 adds in ADRN, and the 64 stores in ST A are
performed in parallel in only three machine instructions.
This means a speedup 64 times faster than a conventional serial computer.

Case 2: N < 64 (The problem size is smaller than the array size)
In this case, only a subset of the 64 PEs will be involved in the parallel operations. The
same memory allocation and machine instructions as in case 1 are needed, except some
of the memory space and PEs will be masked off. The smaller the value
N compared to 64, the severer the idleness of the disabled PEs and PEMs in the array.
Case 3: N > 64 (The problem size is greater than the array size)
The memory allocation problem becomes much more complicated in this case.
The case of N = 66.
The first 64 elements of the A, B, and C arrays are stored from
locations α , α + 2, and α + 4, respectively, in all PEMs.
The two residue elements
A(65), A(66); B(65), B(66); C(65), C(66) are stored in locations α + 1, α + 3, and
α + 5, respectively, in PEMo and PEM1•
The unused memory locations are indicated by question marks.
Six machine language instructions are needed to perform the 66 load, add, and store
operations:
1. LDA α +4 (Load the accumulators of all PEs with the C array).

2. ADRN α + 2 (Add to the accumulators the contents of the B array)
3. STA α (Store the result in the accumulators to the PEMs)
4. LDA α +5 (Load the accumulators of all PEs with the C array).
5. ADRN α + 3 (Add to the accumulators the contents of the B array)
6. STA α +1(Store the result in the accumulators to the PEMs)
The two residue data items in the A, B, and C arrays require three additional Iliac
instructions. In fact, the above six instructions can be used to perform any vector addition
of dimensions 65 ≤ N ≤ 128 in Illiac-IV.

Module 3 Part 1

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Module 3 Part 1

Hochgeladen von

Copyright:

Verfügbare Formate

THE ILLIAC-IV

The Illiac-IV System Architecture

1. The PEs are numbered from 0 to 63.

Common data bus

Load or store instructions

Both instructions are used to transfer information from PE registers to PEM.

Disk file system

Major components in a PE include:

1. Controls and decodes the instruction streams.

6. The CU by itself is a scalar a processor, in addition to its capability of

Accumulator registers There are four accumulator registers (ACAR).

Address adder Address arithmetic is performed by the 24-bit address adder.

ADV AST( Advanced instruction station)

1. Each PE has a 64 bit wide routing path to four neighbors.

Applications of the Illiac-IV

In this section, we examine several programming problems of the Illiac-IV.

In a conventional serial computer, the addition of two arrays (vectors) is realized

Case 1: N = 64 (The array matches the problem size)

The 64 components of the A, B, and C arrays are allocated in memory locations

The machine instruction are:

LDA α + 2 (Load the accumulators of all PEs with the C array).

This means a speedup 64 times faster than a conventional serial computer.

1. LDA α +4 (Load the accumulators of all PEs with the C array).

Das könnte Ihnen auch gefallen