Sie sind auf Seite 1von 4

(IJCNS) International Journal of Computer and Network Security, 179

Vol. 2, No. 10, 2010

Evaluation and improvement of resource retrieving


method, using reinforcement learning in Grid
Network
Erfan Shams1, Abolfazl Toroghi Haghighat2 and Ehsan Ghamari3
1
Islamic Azad University Qazvin Branch, School of Computer Engineering,
Qazvin, Iran
Eshams1364@yahoo.com
2
Islamic Azad University Qazvin Branch, School of Computer Engineering,
Qazvin, Iran
at_haghighat@yahoo.com
3
Islamic Azad University Qazvin Branch, School of Computer Engineering,
Qazvin, Iran
ehsan.ghamari@gmail.com

Abstract: Network calculations make virtual organization able resources to execute these processes is significant issue in
to share distributed resources in geographical view, to achieve the field of distributed networks with grid architecture.
common goals this method lacks a place, a central controlling In this paper, we will consider a part, called decision
and trusted relation generally , in order to solve the gride
controlling management , based on reinforcement learning
problem. It is necessary to find the most suitable resource in the
shortest time. This purpose as applied in part of solving process.
in resource management unit in grids, which makes us
All devoted approaches of information retrieving try to serve all know better about the resources by passing of time and also
requests optimally in the shortest time, but they are unable to makes the resources map onto the request optimally and
match with or be flexible to grid network changes. Therefore, rapidly.
the flexibility in retrieving and resource allocation are We will utilize reinforcement learning since this method
necessary. In the presented paper, a few part is inserted into matches grid network. Reinforcement learning is the online
protocol in order to manage decision controlling based on
learning and also is applied in environment which is
reinforcement learning, that using learning patterns in grid
network, space recognition , factor recognition, a number of reportedly visible .this method is independent of long
factors and obtaining resource information in grid network ,this information mass for instructor (such as neural network and
part performs retrieving operation and resources allocation genetic – based method). But in our framework, regarding
more optical than other methods. rewards which are achieved in the of path, we obtain
recognition or generally rewards based on traversed steps
Keywords: resource retrieving, grid , reinforcement learning. repeating this approach. Our resource (node) recognition
will be stronger; therefore we can understand the network
1. Introduction better.
Network computation is a computation model that huge Following, we introduce a new approach which is proved by
computations can be processed through them, using comparison and testing, to show important and efficiency of
computation power of many netted computes in addition to our work to the previous studies.
keeping them as a unique virtual computer in view. In other
words, rid is able to solve enormous computational 2. Introduction of information retrieving in
problems, using computational power of several separated grid network
computers which are mostly connected through network
Serving request is performed in two manners: 1.real-time
(internet) [1, 3]. One of the most important current issues of
and 2. Non-real-time method. First, serving is rapidly
computer network is the distributed network with grid
performed to set out appropriate resources as soon as
architecture. Regarding of computer applications
receiving request from broker resource which has mentioned
development as well as hardware rapid, advancement ,
conditions, send a message to broker, then broker selects the
creating integrated systems ( from free heterogeneous
best resource based on different factors such as distance and
resources in order to multi purpose process along with
send the request to the resource for serving. Obviously, this
supporting of maximum- efficiency resources).
method makes the network traffic heavier. In second,
As many as some of the investigate and computational
information of resources is available at request reporting
application these networks are for bottleneck problem
time the discovered resources are saved by brokers in
solving and in order to provide users dedicating optimal
resource management part. Optimal resources are searched
180 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 10, 2010
and selected as soon as request report and then the request our presented method with mentioned ones are applying of
will be sent for selected resources. According to this way, reinforcement learning which makes suitable recognition by
network traffic is significantly decreased and therefore the passing of time in to allocate optimally resources.
requests are responded more rapid.
2.1 Breadth First Search (BFS) 3. Reinforcement learning
BFS is one the simplest and most functional searching Reinforcement learning [13] generally is the art of finding
method with easy process. In this method, every node what strategy to improve the status to achieve the certain goal,
has request, sends the request to all its neighbors. The node regarding of environment recognition, relation results
also searches its local information in order to find perfect behind environment and benefits-damages of performing
answer. These mentioned steps will be done for every node several task simply, Reinforcement learning is the learning
which receives request in the case of finding necessary through environmental relation to achieve specified goal
resourced , a message asks the resources through the nearest decision maker and the person who learn, are called
node which has replied to its request [8]. Making heavy “Agents”. The thing what agent make relation with it,
traffic is the disadvantage of this method. Assume that, we named environment (in fact, every foreign thing of agent is
have ‘n’ given nodes so that every node has m neighbors. In involved). This relation continuously, is occurred in this
the first step, the broker sends a message to or responding m manner that, agent makes decision and then operates an
nodes .after passing of some steps, several messages are action accordingly. After that, the environment will respond
sent. This phenomenon makes the network occupied. with granting reward. Lastly, the agent will be transferred to
new state.
2.2 Random Breadth First Search (RBFS) In detail, agent and environment have relation sequencly
This method is similar to the previous one with this through time step t=1,2,3,…. . in every step, for example in
different that corresponding node doesn’t send the message step t, the agent receives a new state from environment. In
for all its neighbors in every step, but it sends the inquiry to this paper, we suppose that the whole space of grid is s. st Є
the part of neighbors also this method has some S where S is the possible state set of allocating environment
disadvantages and advantages decreasing network traffic resources. at is the possible task set of agent whom does
and rapid operating of the search are advantages. Since them in state st. in the new step, environment grants reward
nodes haven’t any information about neighbors that the R in time t + 1, so that r(t+1) Є R. based on its previous
message is sent for them. Therefore there is no wasting time task, the agent will be transferred to the new state S(t+1). To
to verify and decide in other words. Every node selects mathematics, a policy is a mapping. For instance
randomly some of the neighbors and sends the message to
Π :S× A 
→[0,1] is a policy. It means, a number in
them. Neighbor random selection is the significant
disadvantage of this method because the dead parts of [0,1] is appropriated for any pair (action, state) like (s,a)
network which are weakly connected to the network, almost which are belong to Π : S × A . This is shown by
never are made inquiry. Π ( S × A) as follow:
Pr{at = a}{st = s} = Π (s, a )
2.3 Random Breadth First Search with RND-Step
The set of states, actions and sequenced rewards is
actually , this is improvement of previous ones in this
considered in reinforcement learning as follow:
method, we start the search, using ‘n’ brokers instead of
only one broker(‘n’ is depended on given steps) and then
every node of n node searches the releasing resources.
Disadvantages of this method are like RBFS deliberations. Figure 1. The set of stats, actions and sequenced rewards.
The value of s is a function of state value:
In other words, because of being random performance in
searching step, optimal results will not e obtained. On other
hand, searching through couple of paths (and linear
increasing of number of neighbors of all nodes) makes the
efficiency higher. Calculating of real value of the whole state is known as
2.4 Searching with keeping information policy evaluation providing of following of policy Π and it
is necessary for perfect learning the value which can be
In spite of 3 mentioned methods, this method responds the
considered for action-state (s,a) is:
request in non-real status. There are some methods which
Example: when we are going to learn reinforcement
consider the status of neighbors and responses to the
learning system appropriated for the separated and best
requests, including directed breadth first search (DBFS) and
resources to serve requests. According to the mount and
hashing method the efficiency of those methods are higher
time of processing to allocate resources, we initial this unit
then random based methods. In detail, these methods
+1 : appropriate, -1 “ inappropriate , 0: middle. It is notable
decrease significantly network traffic and mount of inquires
that rewarding should be granted so that agent van satisfy
and therefore are able to rapidly the required resources to in-
us, maximizing reward and also it shouldn’t learn, how to
time responses of requests [10, 12].the notable difference of
(IJCNS) International Journal of Computer and Network Security, 181
Vol. 2, No. 10, 2010
satisfy o instance, in example 2 , score +1 ill be granted
when the best resource is selected for processing.

4. Proposed approach to improve retrieving


resources
in this paper, it is considered that, firstly there is no exact
information based on grid network environment (about
entity nodes) but generally, allocating resources will be
more optimal, using patterns learning in grid network and
recognition spaces. Agent st and number of agent a as well Figure 2. Adding reward and punishment units to the grid
as gaining resource information in grid network. There are model
3 general decisional states to allocate resources for agent This term is modeled reinforcement learning core based on
including: marcov model and dynamic programming which
• The resource has suitable processing power and dynamically make relation with 3 other layers (scheduler-
execution memory as well as suitable bandwidth. broker- Gram). This unit decides +according to the two time
• The resource has not suitable processing power and factors which are for a processing in previous mapping and
execution memory but there is suitable bandwidth( this is recorded information in above unit such as processing rate
notable that 2 components of resources are not and data transfer bandwidth to appropriate this resource.
appropriately recognized to process information for RLDU announces proposed resources (the output of RLDU)
making network process rate stable and also it is not to the grid resource manager-Gram. Reword/punishment
recorded at RLDU). (RE/PE) unit grants score +1 t suitable resource allocation
for the processing and otherwise grants score -1 to RLDU.
• The resource owns suitable power of processor and
executable memory but the appropriated bandwidth is 4.1 Resource classification
not suitable (it is notable that the resource is recognized Information is saved on RLDUU based on every mapping
proper for small processing and the resource allocation for resource, but information included processor power,
unit appropriates the small processing for this unit) thus bandwidth, mount of score based on reward/punishment unit
s or state space is S={Good Node, Bad Node, Normal and 3 above condition will be recorded in RLDU before
Node } and selection or tasks apace is : A={Specialty allocating resource the same as in search time of
Process For This Process, Not Specialty Process} . To information and resource comparison. If there is no suitable
optimize resource allocation in a grid network, select resource for processing the request will be queued until
resource issue has effect on efficiency of thought of finding suitable resource .the main reason of this strategy is
network and resource managing select resource has 3 to optimize and profit resource allocation , since if the
above general stats to select the best resource in state request is delivered to the appropriate resource, it will not
space s. in reinforcement learning, the agent goal is probably be completed on certain time. The resource
considered in the form of reward signal which is allocation operation and planning must be repeated this is a
percepred from the environment. In every time step,this disadvantage.
reward is considered as simple number rt Є R. In simple In this method, this score will be granted to the resources
statement, is that the agent goal is maximizing of total of nor request, hence the core will not be granted to the
these rewards. Reward maximizing should be done in resources of RLDU, until performing resource allocation
long term. this doesn’t mean maximizing reward in and request planning we can utilize a selection function for
every step, to use this method in resource allocation, it is choosing a batch of resources to refer to the resources in
supposed that a unit is added to the network layout in order to avoid increasing information volume in RLDU. For
grid network and rewards/punishes to resource allocation example, we separate 20 resources which are matched the
unit based on 2 factors as follow: 1- Appropriated request from bank and then we start searching for finding
processing time for resource 2- Recorded information of optimal resource until the search time is decreased. Then
resource in workload management. Fig.2 sows the resource will be classified in 3 batches based on
implementation of 2 new proposed added units to the classification space defined above, by passing of few times
grid layout network. that the information is collected over the network. Dynamic
In this fig, two different unit are added to the grid protocol property of grid is notable, since the resource of every
that these two units make decision for resource allocation to system gets off inadvertently.
process. In this case, the corresponding node must e deleted from
Reinforcement learning decision unit (RLDU): classification, the score -2 will be granted by RE/PE unit,
because of non-responding of RLDU. This process makes
the resource delete however the information of resources
will be updated permanently by broker, since if the deleted
182 (IJCNS) International Journal of Computer and Network Security,
Vol. 2, No. 10, 2010
resource comes back to the network, it will be classified. 6. Conclusion
The RE/PE unit grants score -2 whenever resource
In this paper, we try to propose an approach to retrieve
allocation and planning are completed, as well as the
resource and verify its efficiency in order to evaluate
request is sent to the resource. But the corresponding
resource retrieving methods in grid network. Generally we
resource will be deleted from network before delivering the
come to conclusion that:
request.
• The efficiency of presented work will be more optimal by
passing of time and resource recognition.
5. Experimental results and valuations
• In the case of changing network significantly and high
The simulation results show that proposed approach is importing and exporting of network, the efficiency of our
matched with distributed network along with grid work will be decreased.
architecture and also efficiency stable. In future work, we are going to improve the efficiency of our
As compare with other methods of searching and allocation, work and presenting some search methods for resource in
out approach has high efficiency and optimality. In addition network. This work uses genetic algorithm and
to that, because of using intelligent factor which make reinforcement learning at the same time.
relation with other units, proposed approach is matched
with grid dynamic heterogeneous network. Using limitation- References
maker search function, we are able to decrease search rate of [1] B. Jacob, L. Ferreira, N. Bieberstein, C. Glizean, J.
recorder information in RLDU. In order to perform this Girars, R. Strachowski, and S. Yu, Enabling,
decreasing, we can consider a search strategy applying Applications for Grid Computing with Globus, IBM
different method in order to select resource. We can Red Book Series, 3P, 2003.
consider optimality of search time and selects resource [2] I. Foster, C. Kesselman, and S. Tuecke, ”The Anatomy
simultaneously. of Grid, Enabling Scalable Vitrual Organizations”,
In Figure 3 shows a number of search methods in search International Journal of High Performance Computing
space, using proposed reinforcement learning which is Applications, XXX (3), pp. 200-222, 2001.
presented in model and also compared with others. [3] A. Abbas, “Grid Computing, A Practical Guide to
Technology and Applications”, Charles Rivar Media,
2004.
[4] M. D. Dikaiakos,” Grid benchmarking: vision,
challenges, and current status”, Concurrency and
Computation: Practice & Experience, 19 (1), pp. 89-
105, Jan 2007.
[5] ”How does the Grid work?”, Available:
http://gridcafe.web.cern.ch/gridcafe/openday/How-
works.html.
[6] R. Buyya, S. Venugopal, “A Gentle Introduction to Grid
Figure 3. Optimization of resource allocation based on Computing and Technologies”, CSI Communications,
reinforcement learning. Computer, Science of India,July 2005.
As shown, search time as compared with used methods is [7] “what is the difference between the internet, the web
decreased. This improvement is evaluated in view of time and he Grid?”, Available: http://gridcafe.web.cern.ch/
and selecting optimal resource for request. In fig 4, search gridcafe/openday/web-grid.html.
methods are compared with proposed approach in [8] “Job life-cycle”, Available: http://gridcafe.web.cern.ch/
gridcafe/openday/New-JobCycle.html.
[9] David Johnson, “Desktop Grids”, Entropia.
[10] Rajkumar Buyya, “Grid Technologies and Resource
Management Systems”, A thesis submitted in
fulfillment of the requirements for the Degree of Doctor
of Philosophy, School of Computer Science and
Software Engineering, Monash University, Melbourne,
Australia, April,12, 2002.
[11] “Sun Powers the Grid, An Overview of Grid
Computing”, Sun Microsystems, 2001.
distributed grid network .as result, my work is optimal in [12] I. Foster, “What is the Grid? A Three Point Checklist”,
Argonne National Laboratory & University of Chicago,
view of search time and cost. It means we can optimize the
July 20, 2002.
result about 15% which are significant in time and cost.
[13] "Reinforcement Learning: An Introduction", Richard
Figure 4. Compartment of search method with presented
S.Sutton and Andrew G.Barto, Cambridge, MA, 1998,
approach
MIT Press.

Das könnte Ihnen auch gefallen