Sie sind auf Seite 1von 8

ISSN 2278-3091

Volume
No.2, March
April
2015 4(2), March - April 2015, 36 - 43
Vikas Shinde, International Journal of Advanced
Trends4,
in Computer
Science- and
Engineering,

International Journal of Advanced Trends in Computer Science and Engineering


Available Online at http://www.warse.org/ijatcse/static/pdf/file/ijatcse06422015.pdf

Evaluation of Parallel Processing Systems through Queuing Model


Vikas Shinde
Department of Applied Mathematics,
Madhav Institute of Technology & Science, Gwalior-India

ABSTRACT

and the programs as customers. A model of

In this investigation, Jackson queueing network

parallel processing system is a system which is

has been widely used to model and analyze the

expandable in vertical and horizontal manner and

performance of complex parallel systems. M/G/1

can be treated as cluster for a single queue of

queueing system is used to model a parallel

waiting jobs. A job is modeled as a sequence of

processing system, which is expandable in vertical

independent stages which must be processed,

and horizontal manner. Determine a closed form

where the number of processors desired by the

solution for the system performance metrics, such

jobs in each stage may be different. If, for some

as processors waiting time, system processing

stage, the job in service requires fewer processors

power, etc.

than the system provides, then the job will occupy

Keywords: Queueing Network, Massive Parallel

the processors according to its need and the other

Processing, Shared Memory, Waiting Time.

processors will be idle for that stage. If, for some


other stage, the job in service requires more

1.

INTRODUCTION

processors than the system provides, then it will

Parallel processing of the computer

use all the processors in the system for an

systems has been widely studied due to a

extended period of time such that the total work

significant role in day-by-day fast computing of

served in that stage is conserved.

the

Many researchers have extensively investigated

jobs.

As

parallel

computing

systems

proliferate the need for effective performance

processing

evaluation, queueing techniques become ever

approaches. Al-Saqabi et al. [1] established a

more important. In fact, the performance of such

distributed scheduling algorithm that will track

systems depends on the hardware resources,

the available workstations i.e the workstations not

(CPU, Memory, etc.,) on software (system

being used by their owners in networks and act

programs,

the

upon those workstations by scheduling processes

organization and management of these resources.

of parallel applications onto them. Guan and

In view of the increasing complexity of

Cheung [2] constructed a massively parallel

computing systems, it is more and more difficult

processing system which has drawn a lot of

to predict their performance indices based on

attention to an important feature affecting the

analytical queueing models. In such models, it is

performance and characteristics of the architecture

convenient to represent the resources as servers

with an interconnection of multiple processors.

compilers,

etc.,)

and

on

36

systems

via

queue

theoretic

Vikas Shinde, International Journal of Advanced Trends in Computer Science and Engineering, 4(2), March - April 2015, 36 - 43

Jean-Marie et al. [5] introduced a hybrid

dynamic allocation of the resources of a general

analytical approach by using techniques from the

parallel processing system comprised of several

theories of both stochastic task graphs and

heterogeneous processors.

queueing networks. Jozwiak and Jan [6] discussed

The rest of paper is organized as follows. Model

quality driven model based multi processor

description is given in section 2. In section 3,

accelerator

adequately

described the governing equation and their

addresses the architecture design issues of

performance analysis. Conclusion is mentioned in

hardware multi processors for the modern highly

section 4.

demanding
studied

design

method

embedded

that

applications.

communication

Jan

architectures

[7]
for

2. MODEL DESCRIPTION

massively parallel hardware multi processors.

Every computer consists of a set of processors

Systematic framework and a corresponding

(CPUs) P1, P2, P3, Pn and m 0 shared

methodology for workload modeling of parallel

memory units M1, M2, M3,..Mm

systems was proposed by Kotsis [8]. Mohapatra

communicate via an interconnection network N,

et al. [10] proposed the structure for processors

as illustrated in figure 1.

which is divided into groups or cluster and

constitute a global main memory that provides a

organized in several stages. Maheshwari and Shen

convenient message depository for processor-to-

[11] established a clustering algorithm wherein all

processor communication. A system with this

the clusters have balanced amount of computation

arrangement is called a shared memory computer.

load and there is only one communication path

A global shared memory can be a serious

between any pair of clusters. Nassar [12]

bottleneck, particularly when the processors share

evaluated the throughput of several multi buses as

large amounts of information, since normally only

a discrete time Markov chain under different

one processor can access a given memory module

working conditions. Reijns [13] considered the

at a time. If the processors have their own local

delay effect caused by memory interference in a

memories, then the global memory can be reduced

parallel processing system with shared memory

in size, or even eliminated completely. To

was

separate the functions of processing and memory,

implemented

queueing.

Tomic

using

machine repair

[14]

gave

the

which

The memory units

matrix

which refer to a CPU with no associated main

representation of the linear evolution operator of

memory, but with other temporary storage units

the certain class of parallel processing system and

such as register files and caches as a processing

effectively used as a performance prediction tool

element (PE).

for the modern parallel processing systems.


Wasserman et al. [15] studied the problem of

37

Vikas Shinde, International Journal of Advanced Trends in Computer Science and Engineering, 4(2), March - April 2015, 36 - 43

PE1

PE2

PE3

PEn

Interconnection network N

Memory M

Figure 1: Shared Memory


The basic cluster shown in figure 2, each

The basic cluster is defined in two ways:

processing unit has a local memory for its own

(i) by increasing the number of the processing

computation and there is a shared memory for

units or using several basic clusters with one

facilitating the communication between the

additional memory that is shared by those

processors. A horizontal communication network

clusters, and (ii) in a two stage system, it must be

(HCN) is used for transmitting data between

noted that in the second level of the system, there

processors and shared memory. Moreover the

is a HCN that connects the VCN of each basic

basic cluster includes a unit for I/O operations and

cluster to SM2. The units that are located inside

a unit for supervisory and managing the

the basic clusters are indicated by (SM1, HCN1,

processors. A vertical communication network

.), and the units that are located outside of the

(VCN) is used for transmitting control signals and

cluster are indicated by (SM2, HCN2,)

vertical expansion of the system.

SM1

Horizontal Communication Network


LM1

P1

LM2

P2

LMN
I/O

PN

Vertical Communication Network


Figure 2: Basic Cluster

38

Manager

Vikas Shinde, International Journal of Advanced Trends in Computer Science and Engineering, 4(2), March - April 2015, 36 - 43

SM2

Horizontal Communication Network 2

SM1

I/ O
Cluster 2

Manager

HCN 1

VCN 1

Vertical Communication Network 2


Figure 3: Two Stage System
This method can expand the system

horizontally by increasing the number of PCs in

vertically and constructing s-stages system. A

each level. In multistage clustering structure based

cluster in i

th

stage of the s-stages system is

system, if there are number of PCs that make a

depicted in figure 4. Here cluster include some

cluster will be equal for all clusters of ith stage, the

processing clusters or PC namely, one I/O cluster

system is known as homogenous at level i. If

and one managing cluster. There are two

system is homogenous in all level it will be called

interconnection networks, HCNi and VCNi that

homogenous on the other hand if it will not be

transmitting data inside and outside of the clusters

homogenous at least in one stage, it will be

respectively. Such systems can be expanded

recognized as non homogenous or heterogeneous.

vertically by increasing the number of stages or

39

Vikas Shinde, International Journal of Advanced Trends in Computer Science and Engineering, 4(2), March - April 2015, 36 - 43

SMi

Horizontal Communication Network i

I/OCi-1
PCi-1

MCi-1

PCi-1

Vertical Communication Network i

Vertical Expansion Path

Horizontal Expansion Path

Figure 4: Cluster in ith stage of s stage system


3. THE PERFORMANCE ANALYSIS

For evaluating the performance of the

and Co is the number of processors in

system, let consider the system is constructed based on


homogenous MSCS. In this system any processor

Processors itself generated the inter job


communication requests.

The time between two consecutive requests

probable that a job needs to communicate with the

have

other

parameter .

jobs.

Therefore several

queues can

be

constructed for each interconnection networks and

each

basic cluster.

performs a piece of the main program that is called


processors job. During the job execution, it is

Ci is the number of PCS in ith stage of system

shared memories.

exponentially

distributed

with

Access time to memory in ith stage has


exponentially distributed with parameter mi.

Consider the following assumptions for analyzing the


system.

The destination of each request will be


uniformly distributed between processors

40

Vikas Shinde, International Journal of Advanced Trends in Computer Science and Engineering, 4(2), March - April 2015, 36 - 43

jobs and the probability of outgoing request


from ith stage is denoted by Pi.

PROC

The service time of the inter connection


networks in ith stage have exponentially

v1

1P1

distributed with parameter hi and vi for


h1

HCNi and VCNi, respectively.

Conflict

over

memory

P1

modules

HCN1

and

VCN1

interconnection networks will be resolved by


the queueing center which is modeled as

v2

h2 1P2

P2

M/G/1.

Request processors must be waited until they

VCN2

HCN2

SM1

FromVCNs-2

offer service as per above scheme and during


m1

waiting period, they can not generate any

h3

vs1

Ps1

1 P3 P3

To VCN3

other request.
SM2

The parallel processing system in which the

HCN3

VCNS-1

input rate of each stage must be computed and


m2

queueing problem is analyzed by developing the

M/G/1 model. For analyzing the design of MPPs with

hs

HCNS

SM3

a large number of units, the area of computation for


closed queueing network will be very large. Apply

m3

queueing network methodology for analyzing the


closed queueing network and also determine the input

SMs

rate of each service center as a function of the input


ms

rate for previous center. This technique can reduce the

Figure 5: Multi stage Cluster MPPs with s stage system

calculation and simulation time.


As shown in the figure 5, all the request departs

Since there are (C0-1) processors in each basic cluster,

from HCNi will pass through the SMi with probability

the requests that receive to HCN1 and VCN1

one. Therefore, compute input request rate of VCNs

originating from other processor in the same cluster,

and HCNs. The processor requests will be directed to

indicated by h1 and v1, will be (1-P1)(C0-1) and

service center HCN1 and VCN1 by probability (1-P1)

P1(C0-1), respectively. So the total requests of the

and P1, respectively. If the request rate of a processor

processors that received to service centers in the first

will be , the input rate of HCN1 and VCN1 that

stage can be computed by following equations:

originated from that process will be (1-P1) and P1.

41

Vikas Shinde, International Journal of Advanced Trends in Computer Science and Engineering, 4(2), March - April 2015, 36 - 43

steps will be negligible. After calculating the

v 1 P1 ( C 0 1 ) P1
(1)
C 0 P1

effective request rate, the waiting time can be


determine by Little formula as

m1 h1 (1 P1 ) (C 0 1)(1 P1 )
C 0 (1 P1 )

L 1
2 2 2 s
L W or W


2(1 ) (8)

2 2 2 2 s

2 (1 )

(2)
The input request rate at the ith stage from each PCs is
(vi-1)

vi Pi v( i 1) (C i 1 1) Pi v (i 1) C i 1 Pi v( i1)

Here Pvi , , Pmi , Phi are the probabilities that referred to

(3)

a processor request to

VCN i , SM i & HCN i

respectively and computed by the following product


type solution

mi hi (1 Pi ) v (i 1) (C i 1 1) (1 Pi ) v ( i 1)
C i 1 (1 Pi ) v (i 1)

i 1

Pvi Pj 1

(4)

(9)

j0

Pmi Phi

In the last stage there is no request for outer cluster, so


that

(1 Pi ) i 1
Pj 1

Pi
j 0

(10)

By determining the average waiting time of a

vs 0

processor for each communication request,


which can
(5)
determine the processor utilization as by using:

(5)

ms hs C s 1 (1 Ps ) v ( s 1) C s 1 Ps v ( s 1)
C s 1 v ( s 1)

(6)

Processor

Utilization

1
2( )

w 2 2 2 s

Now consider M/G/1 model to calculate the

PU

(11)

queue length at each mode for all stages, then the

Total processing power of the system (TPP), is

average of total waited processors in the system can be

obtained by considering the single processor power

computed as.

(SPP). Thus

By using Pollaczek-Khintchine formula, it give


TPP
2


2(1 )

2( ) SPP
2 2 2 s

(7)

PU

SPP

(12)
(7)

i 0

The waited processors would not be able to


generate the request. In this situation the effective

4. CONCLUSIONS

processors request rate would be lower than the

In

required. The effective request rate will be decreased

this

investigation,

the

performance

modeling of a parallel processing system as a

with the same ratio as there are active processors in

sequence of stages, each of which requires a

the system. L and have been calculated

certain integral number of processors for a certain

successively till their changes in two consecutive

integral of time. This proposed a new structure and


42

Vikas Shinde, International Journal of Advanced Trends in Computer Science and Engineering, 4(2), March - April 2015, 36 - 43

developed an analytical model for massive parallel

parallel hardware multi processors, J of

processing system based on queueing theory. The

Parallel Distribution Computing, Vol. 72, pp.

system performance metrics may provide insights

1450-1463. (2012)
8.

to the system designers and decision makers to

Kotsis, G.: A systematic approach for


workload modeling for parallel processing

improve the system at optimal cost.

systems, J. Parallel computing, Vol. 22, No.


13, pp. 1771-1787. (1997)

REFERENCES
9.

1. Al-Saqabi, K., Sarwar, S. and Saleh, K.:

Computer Application, New York Wiely.

Distributed gang scheduling in networks of

(1975)

heterogeneous workstations, J. Computer

10. Mohapatra, P., Das, C. R. and Feng, T. Y.:

Communications, Vol. 20, No. 5, pp. 338-

Performance

348. (1997)

programs, J. Parallel Computing, Vol. 24,

(2000)

No. 5-6, pp. 893-909. (1998)

3. Hayes, J. P. : Computer Architecture and

12. Nassar, H.: A Markov model for multibus

organization, McGraw-Hill. (1998)

multiprocessor systems under asynchronous

4. Hwang, K. H. and Xu, .Z.: Scalable parallel

operation, J. Information Processing Letters,

computing, McGraw-Hill (1998)

Vol. 54, No. 1, pp. 11-16. (1995)

Jean-Marie, A., Lefebvre-Barbaroux, S. and

13. Reijns, G. L. and Gemund, Van. J. C.:

Liu, Z.: An analytical approach to the

Analysis of a shared- memory multiprocessor

master-slave

via a novel queueing model, J of system

computational models, J. Parallel Computing,

Architecture, Vol. 45, No. 14, pp. 1189-1193.

Vol. 24, No. 5-6, pp. 841-862. (1998)


6.

(1999)

Jozwiak, L., Jan, Y.: Design of massively

14. Tomic, D.: Spectral performance evaluation

parallel hardware multi-processors for highly

of parallel processing systems, J. Parallel

demanding embedded applications, J of

computing, Vol. 13, No. 1, pp. 25-38. (2002)

Microprocessors and Microsystems, Vol. 37,

15. Wasserman, K. M., Michailidis G. and

pp. 1155-1172. (2013)


7.

Jan,

Y.

and

Jozwiak,

L.

based

clustering algorithm for partitioning parallel

Architecture, Vol. 46, No. 13, pp. 1185-1190.

of

cluster

11. Maheshwari, P. and Shen, H.: An efficient

parallel processing system, J. of systems

evaluation

of

Vol. 43, pp. 109-114. (1994)

approaches for constructing a massively

performance

analysis

multiprocessor, IEEE Trans. on Computer,

2. Guan, H. and Cheung, To-Yat. : Efficient

5.

Keleinrock, L.: Queueing Systems, Vol. II,

Bambos, N.: Optimal processor allocation to

Scalable

differentiated job flows, J. Performance

communication architectures for massively

Evaluation, Vol. 63, No. 1, pp. 1-14. (2006)

43