Beruflich Dokumente
Kultur Dokumente
Environment
EunJoung Byun, SungJin Choi, MaengSoon Baik, ChongSun Hwang
Dept. of Computer Science and Engineering, Korea University
Anam-Dong, Seongbuk-Gu, Seoul 136-701, Korea
{vision, lotieye, msbak, hwang}@disys.korea.ac.kr
ChanYeol Park
Supercomputing Center, KISTI, Korea
chan@kisti.re.kr
Abstract
A volunteer node can join and leave a volunteer computing system freely. However, existing volunteer computing
systems suffer from interruptions of job execution , delays of
execution time, and increases of total execution time, since
they do not consider dynamic scheduling properties (i.e.
volatilities), such as leave, join, and suspension. Therefore,
dynamic execution properties of volunteer node should be
considered in scheduling schemes, in order to design a stable and reliable volunteer computing system. This paper
proposes a new scheduling scheme based on the Dedication
Rate (DR), which reflects the dynamic properties of a volunteer. The scheduling scheme improves the completeness
and reliability of execution, while also decreasing delay and
total execution time. In addition, an implementation of the
proposed scheduling scheme on top of Korea@Home is described, as well as performance evaluation.
1. Introduction
Volunteer Computing (VC)[1] is a type of Internet-based
parallel computing paradigm, processing large-scale computation through the participation of idle computing volunteer resources on the Internet. This technology is known as
Desktop GRID, Peer-to-Peer GRID, and Global Computing. VC constitutes large-scale computing power based on
the proliferation of personal computers and the rapid growth
of the Internet. This attractive computing paradigm aims
at achieving high throughput and performance. The system is distributed over a large number of computers, there This paper was supported by Korea@Home project from the Korea
Institute of Science and Technology Information(KISTI).
SoonYoung Jung
Dept. of Computer Education, Korea University
jsy@comedu.korea.ac.kr
fore effectively decreasing the load while increasing performance. SETI@home[7] is a well-known example for
achieving tremendous computing power made possible by
the Internet. The main difference between VC and traditional GRID Computing is the stability of resources. Most
resources are devoted to the system in the existing GRID
system whereas resources can leave and join freely according to the volatile aspects of the Volunteer Computing System (VCS). In addition, VC aims at providing facilities
when users are accessing and using the systems. Users
can easily participate in the system and conveniently use
the system. VC does not require any technical background,
complex setup, or other unique requirements. VC harvests
idle cycles of hundreds of thousands to millions of desktops
connected over the Internet. It offers tremendous potential
computing power at low cost, when compared to traditional
GRID Computing.
Volatile participants of Volunteer Nodes (VNs), however,
make VCS unstable. Job suspension may occur frequently
due to the volatility, delaying total execution time. Volatility represents the state in which a resource is unavailable for
use, has originated from autonomous and dynamic aspects
of participants, or is interrupted due to the activities of the
users. Volatility is caused by individual physical machine
crashes, software failures, and intermittence of user connectivity. This can also be caused by the temporary disconnection of the physical communication link. This volatility
is the main cause of a decrease in reliability and an increase
in total execution time. The VCS cannot guarantee that
the VN will finish as scheduled or that the execution will
be stable. There is a requirement for scheduling schemes
that can leverage performance despite the variability and
volatile aspects of volunteers. This requirement has motivated the proposed method. An existing scheduling scheme
in a VCS, for example, Eager scheduling scheme, is inca-
Proceedings of the 4th International Symposium on Parallel and Distributed Computing (ISPDC05)
0-7695-2434-6/05 $20.00 2005
IEEE
2. Related Work
2.2. Availability
Availability is generally defined as a property acknowledging that a system is ready to be used immediately. This
may be represented as the probability that the system operates correctly at any given time. Recently, availability has
become an important issue in Desktop GRID environment
and Peer-to-Peer systems, which both consist of utilizing
personal computers as the core resource. Much existing research focuses on CPU clocks, memory capacity, and network bandwidth.
In this paper, the focus is on durability of execution,
rather than the physical capacities, because the highly dynamic and uncertain activities of a volunteer lead to job
task failure, performance deterioration, and system instability. Therefore, execution time is delayed and total execution
time increases due to the volatile aspects of desktops.
In [8], the scheduling scheme is generally concerned
with the capability of the volunteer desktop, in selecting an
eligible volunteer for job execution. The availability distribution in Desktop GRID environments is described in [12],
where experimental measurements and mathematical distribution are compared. However, they do not take into account intermittent job execution caused by user. In [10]
and [9] use simple metrics, the percentage of available CPU
Proceedings of the 4th International Symposium on Parallel and Distributed Computing (ISPDC05)
0-7695-2434-6/05 $20.00 2005
IEEE
cycles of desktop machines. These studies, however, suffer from job suspension during execution, because they do
not consider user intermittence such as keyboard or mouse
activities. [14] not only has insight into the detailed temporal structure of CPU availability of Desktop GRID resources, but also offers extensive measurements of an enterprise Desktop GRID with a real platform. This study
divides availability into host availability and CPU availability, to separate the practical execution state from the turning
on state. In addition, simple statistical measurement is offered. In the Peer-to-Peer systems, [11] and [13], separate
resources, exist, whether usable at the time or not. Existing scheduling schemes for the Desktop GRID are either
considered to have rare availability or only to utilize simple
statistical availability. These simple availability concepts
are inadequate to represent features of the volunteer node
and to guarantee execution stability and execution completion. To improve the performance and reliability, a more
precise metric for availability of the volunteer desktops is
proposed.
Group computers
Job Commissioner
VN
VN
VN
LAN
VN
VN
VN
LAN
VN
VN
VN
INTERNET
VN
LAN
VN
VN
LAN
VN
VN
VN
VN
VN
VN
Group computers
VN
VN
VN
VN
VN
VN
Group computers
4. Dedication Rate
LAN
VN
m
n
(1)
VC =
JCi CS[
i=1
i=1
GMi ]
V Ni
i=1
Proceedings of the 4th International Symposium on Parallel and Distributed Computing (ISPDC05)
0-7695-2434-6/05 $20.00 2005
IEEE
n
n
EAi
)([
Pi logPi ]SY Smax )[
Pi logPi ]V N )
EAV N max
i=1
i=1
(3)
Figure 2. HA and EA
HA and EA of each VN is measured as the sum of differences between the start time and end time representing
the pair of tns and tne where V n = [tns , tne ] in Figure 3. tns
means nth availability of start time and tne means nth availability of end time. Each HA, HU, EA, and EU are recorded
in temporal availability profile on VN.
V1
[t 1s , te1 ]
V2
[t s2 , te2 ]
Vi
[t si , tei ]
n
n
HAi
)([
Pi logPi ]SY Smax )[
Pi logPi ]V N )
HAV N max
i=1
i=1
(2)
In a VCS with dynamic participation and execution, successful execution is determined by the durable availability
of a volunteer. An accurate estimation of VN availability
is important in guaranteeing the performance of the VCS.
However, the duration of availability is unknown in advance. Therefore, it is difficult to predict participation and
execution patterns, and measure future performance accurately. In order to enhance the accuracy, a predictable availability duration is found using a heuristic approach considering past patterns of execution and participation.
This durable and predictable factor, DR, is measured
based on past experience using VA profiles. The CS manages profiles for the HA and EA of each VN. The availability has a set probability for the state of the system, regardless of whether it can be accessed. In order to to improve
pattern prediction of the participating system more accurately, DR is introduced using the entropy of availability, because simple availability is often inaccurate or insufficient
expressions and characteristics vary over time in existing
scheme.
Few studies have introduced the availabilities of the VN
while some studies have assumed, for simplicity, that the
availability distribution follows exponential. In practice,
however, assumptions regarding the distribution availability are wrongly matched to the real system. According to
[12], the availability of volunteers is assumed to follow a
hyperexponential distribution. This means the hyperexponential distribution and Weibull distribution are accurate in
representing the availability distribution in the experiments,
for the VCS.
Instead of using availability directly, DR is introduced to
more accurately measure participation patterns and execution. DR measures the degree in which VN is dedicated to
Proceedings of the 4th International Symposium on Parallel and Distributed Computing (ISPDC05)
0-7695-2434-6/05 $20.00 2005
IEEE
(4)
In Equation 4, DR is concerned with HAR and EAR reflecting the temporal availability ratio regarding regularity
and predictability. This measures the quality of participation and the execution of VN, the regularity of VN, and the
predictability of HA and EA. and are weight value to
give consequence to HAR and EAR, which is capable of being adjusted by the scheduling policy. The predictions for
durable availability are provided through DR which can be
supplied to a scheduler, and used to improve performance.
In Section 5, the advanced scheduling scheme based on this
DR, is described. The system can improve performance, because DR allows schedulers to form a prediction of the VN.
DR improves system reliability and execution completeness
through moderating the phenomenons of execution suspension. As a result, VSC increases performance and decreases
total execution time.
Procedure PreparatoryStage ()
1: if (Job > WorkLoad) then
2:
SplitIntoWorkLoad(Job);
3: elseif (Job WorkLoad) then
4:
ScheduleBasedOnDR(VN, WorkLoad);
5: endif
6: while(VN > 0)
7:
CalculateDR(VN);
8: ListOfVN = OrderVN(DR);
9: if(SameOrder > 0) then
10: ReOrderVN(ListOfVN, EAR);
11: endif
12: DRSYSmin = getDR(min(ListOfVN));
13: DRSYSavg = getDR(avg(ListOfVN));
14: DRSYSlavg = getDR(DRSYSavg, lower);
15: DRSYSuavg = getDR(DRSYSavg, upper);
16: DRSYSmax = getDR(max(ListOfVN));
17: ListOfVN = OrderVN(DR);
Proceedings of the 4th International Symposium on Parallel and Distributed Computing (ISPDC05)
0-7695-2434-6/05 $20.00 2005
IEEE
Procedure FaultTolerance(State)
1: if (Check(State == CRASH))
2: elseif (Check(State == LEAVE)) then
3:
DRbasedSchedule(WorkLoad);
4: elseif (Check(State == STOP)) then
5:
while (DelayTimeVN < DelayTimeSYSavg)
6:
Wait();
7:
DRbasedSchedule(WorkLoad);
8: endif
and round trip time in the network, derived from EA, EU,
and Network Delay(ND) respectively. Probability of execution success measures the probability of continuous execution.
nFT of the workload is given in Equation 5 where {n |
i=1 EAi TW Lj } = 0, for n 1.
during execution due to the machine crash or the secession of user. The scheduler compares delay time of the VN
with delay time of the system in deciding whether to wait
or reschedule when the state is set at the stop state.
F TW L =
n
EAi +
i=1
EUi + 2 N D
(5)
i=1
n
EAi +
i=1
n
n
EUi + 2 N D}
(6)
i=1
F TW L
= {(
n
EAi +
i=1
n
i=1
m
EAj +
j=1
m
j=1
(7)
l
n
n
(
EAi +
EUi + 2 N D)}
k=1 i=1
Eager
F TW
= min{Tk | Tk =
L
(8)
i=1
l
n
n
(
EAi +
EUi +2N D)}
k=1 i=1
i=1
(9)
F TW L
= min{Tk | Tk =
i=1
IEEE
EAi +(min{Tj | Tj =
m
EUj })+2N D}
j=1
(10)
Proceedings of the 4th International Symposium on Parallel and Distributed Computing (ISPDC05)
0-7695-2434-6/05 $20.00 2005
n
(11)
SSDR
SSDR
= max{Tk | Tk = F TW
}
T F TJob
L
(12)
n
pi i e(+(1DR))
(13)
i=1
SSDR
}=
P {SEXE
n
pi i e
(14)
i=1
Proceedings of the 4th International Symposium on Parallel and Distributed Computing (ISPDC05)
0-7695-2434-6/05 $20.00 2005
IEEE
so that each allotted application is executed. The experiment commenced with the selection of an appropriate VN.
VNM manages registered VNs and selects the VN by comparing DR. JS in CS allots an application workload as a
unit of ten thousand. Of these workload, one workload is
a task to find a prime number in the range of ten thousand. Each VN will experience continuous execution as one
workload is executed after another. The system is tested using over 10,000 workloads run on 100 different VNs which
increased from 10 to 100. DR was measured by considering
the HA and EA of each VN every 30 minutes. In the proposed scheduling scheme, SSDR, has both stability and performance merits describing experimental results. When implemented, Job Scheduler adaptively schedules a workload
to each VN using SSDR. To achieve good performance, the
scheduler must process allocation with predictable VN behavior. Two scheduling schemes were evaluated, the proposed scheduling scheme and an existing Eager scheduling
scheme, implemented on top of Korea@Home with an additional scheduling module. Using Korea@Home with both
scheduling schemes, we ran large numbers of tasks slotted
with respect to their execution time. Interestingly, results
showed that it performed well where the number of VNs
has increased. Experimental results are shown in Figure 8
and Figure 9, and represent the total execution time of the
job. Each job consists of 50 workloads with 10 VNs.
Total execution time was often parameterized to measure
performance representing processing power characteristics.
While Eager scheduling scheme can also show acceptable
performance, weakness are covered in terms of reliability
and stability inherent in a VCS with large number of volunteers. In this case, performance is better than the Eager
scheduling scheme.
7. Conclusions
In this paper, SSDR is proposed, in order to overcome
the limitations such as increasing total execution time, decreasing performance, unpromising job completion, and so
on. Specification of dynamic properties, such as volatility,
have been presented. In addition, HA and EA are used to
parameterize a metric measuring time duration. A method
to determine each value, has been given. In these accurate metrics, the credit of the VN improves a performance
over that of earlier studies. Good performance has also
been achieved from experimental results. The analytical results indicate that the proposed SSDR performs well, while
the empirical results represent that the proposed scheduling
scheme outperforms the other existing VCSs using Eager
Scheduling Scheme.
References
[1] Luis F. G. Sarmenta, Volunteer Computing, Ph.D. thesis.
Dept. of Electrical Engineering and Computer Science, MIT,
2001.
[2] Bernd O. Christiansen, Peter Cappello, Mihai F. Ionescu,
Michael O. Neary, Klaus E. Schauser, Daniel Wu Javelin:
Internet-Based Parallel Computing using Java, Concurrency
: Practice and Experience, 1998.
[3] G. Fedak, C. Germain, V. Neri and F. Cappello. XtremWeb :
A Generic Global Computing System. CCGRID2001, 2001.
[4] A.Baratloo, M.Karaul, Z.Kedem, and P.Wyckoff. Charlotte:
Metacomputing on the web. In proceedings of 9th Conference
on Parallel and Distributed Computing System, 1996.
[5] N. Camiel, S. London, N. Nisan, and O. Regev. The POPCORN Project: Distributed Computation over the Internet in
Java. In 6th International WWW Conference, 1997.
[6] Korea@Home homepage. http://www.koreaathome.co.kr
[7] D.P.Anderson, J.Cobb, E.Korpela, M.Lebofsky, D.Werthimer,
SETI@home: an experiment in5 public-resource computing, Communications of the ACM, 2002.
[8] L. F. Lau, A.L. Ananda, G. Tan, W. F. Wong, Gucha :
Internet-based parallel computing using Java, ICA3PP, 2000.
[9] H. Casanova, A. Legrand, D. Zagorodnov, and F. Berman.
Heuristics for Scheduling Prameter Sweep Applications in
GRID Environments. HCW00, 2000.
[10] R. Wolski, N. Spring, and J. Hayes. The Network Weather
Service: A Distributed Resource Performance Forecasting
Service for Metacomputing., Future Generation Computing
Systems, 1999.
[11] Bhagwan, Savage, and Voelker, Understanding availability, IPTPS, 2003
[12] John Brevik, Daniel Nurmi, Rich Wolski, Modeling machine availability in enterprise and wide-area distributed computing environment, Technical Report CS2003-28, 2003.
[13] Jacky Chu et al., Availability and Popularity Measurements
of Peer-to-Peer File systems, Technical Report 04-36, 2004.
[14] Derrick Kondo, Michela Taufer, John Karanicolas, Charles
L. Brooks, Henri Casanova and Andrew Chien, Characterizing and Evaluating Desktop GRIDs - An Empirical Study,
IPDPS, 2004.
Proceedings of the 4th International Symposium on Parallel and Distributed Computing (ISPDC05)
0-7695-2434-6/05 $20.00 2005
IEEE