The Scalability of Spatial Reuse Based Serial Storage Interfaces

The Scalability of Spatial Reuse Based Serial Storage Interfaces
Tai-Sheng Chang, Sangyup Shim and David H.C. Du ftchang, shim, dug@cs.umn.edu Distributed Multimedia Research Center, Department of Computer Science, University of Minnesota
Abstract
Due to the growing popularity of emerging applications such as digital libraries, Video-On Demand, distance learning, and Internet World-Wide Web, multimedia servers with a large capacity and high performance storage subsystem are in high demand. Serial storage interfaces are emerging technologies designed to improve the performance of such storage subsystems. They provide high bandwidth, fault tolerance, fair bandwidth sharing and long distance connection capability. All of these issues are critical in designing a scalable and high performance storage subsystem. Some of the serial storage interfaces provide the spatial reuse feature which allows multiple concurrent transmissions. That is, multiple hosts can access disks concurrently with full link bandwidth if their access paths are disjoint. Spatial reuse provides a way to build a storage subsystem whose aggregate bandwidth may be scaled up with the number of hosts. However, it is not clear how much the performance of a storage subsystem could be improved by the spatial reuse with different congurations and trafc scenarios. Both limitation and capability of this scalability need to be investigated. To understand their fundamental performance characteristics, we derive an analytic model for the serial storage interfaces with the spatial reuse feature. Based on this model, we investigate the maximum aggregate throughput from different system congurations and load distributions. We show how the number of disks needed to saturate a loop varies with different number of hosts and different load scenarios. We also show how the load balancing by uniformly distributing the load to all the disks on a loop may incur high overhead. This is because the accesses to far away disks need to go through many links and consume the bandwidth of each link it goes through. The results show the achievable throughput may be reduced by more than half in some cases.
Introduction
Serial storage interfaces are emerging technologies designed to improve the current storage subsystem. It provides high bandwidth, fault tolerance, fair bandwidth sharing and long distance connection capability ([3]). All of these need to be considered in designing high performance and scalable storage subsystems. Their address-
ing schemes are also capable of connecting at least hundreds of storage devices on the same channel. There are several serial storage interface standards being proposed. Based on the way they access the shared channel, these interfaces can be grouped into two categories. In the rst category, the whole channel is shared by all the nodes (could be either a host or device) attached to the channel in a time-division multiplexing fashion. Therefore, only one node can send data to the channel at any given time. The total achievable throughput is hence limited by the link bandwidth. Thus, this type of standards usually has higher link bandwidth. FC-AL[7, 8, 9, 3, 4] belongs to this category. Figure 1 (a) shows an example of a dual loop with 100MB/sec on each direction. The maximum possible throughput is 200MB/sec when the loop in different directions can be operated independently. The interfaces in the other category share every link in a loop independently. Since each link is operated independently, multiple pairs of nodes can exchange information simultaneously as long as their access paths are not overlapped. This feature is called spatial reuse. The serial interfaces having this feature include SSA[1, 2, 3, 4], Fibre Channel Aaron Proposal[5], and Fibre Channel TORN Proposal[6]. Figure 1 (b) shows an example that 200 MB/sec is the maximum possible throughput could be achieved with the 40 MB/sec links but with spatial reuse. Although the spatial reuse may provide a potential aggregate bandwidth equal to several times of the link bandwidth as we showed in Figure 1, one may not be able to take advantage of the spatial reuse and expect the aggregate bandwidth to be well scaled up by connecting a large number of hosts and disks in a loop. The aggregate bandwidth by spatial reuse is greatly affected by the access patterns of the hosts and load distributions of the disks. For example, if two hosts are connected directly next to each other, it cannot fully take advantage of spatial reuse. Figure 2(a) shows an example of this conguration. The data retrieval from storage going to host B through link b1 will also consume the bandwidth of link a2. Furthermore, the number of disks between two hosts also limits the effective throughput since the throughput of a disk is less than that of a link. The Figure 2(b) shows a conguration that only one disk is connected in between Hosts A and B. In this conguration, neither link a2 nor b1 can be effectively used. Therefore, what an appropriate conguration should be to take advantage of this spatial reuse feature becomes an important question in designing a scalable storage subsystem. Other questions include how many disks can be added to a loop before saturating the loop? How does a system scale up when we add more disks or hosts to a loop? Under what condition it will under utilize the disk and waste the disk bandwidth? All these questions need to be answered in order to fully take advantage of spatial reuse. Load distribution is another factor that will affect the scalability. The load distribution on the hosts, disks and links need to be properly arranged. In order to fully utilize all the links, the load
100MB/sec
100MB/sec
Tape
(a): System without spatial reuse (only one tansmission in each direction allowed) Total aggregate throughput = 200MB/sec
40MB/sec
40MB/sec 40MB/sec
backup to tape 40MB/sec
40MB/sec
Tape
(b): System with spatial reuse (multiple tansmissions in each direction allowed) Total aggregate throughput = 200MB/sec in this example.
Figure 1: A comparison of serial storage interfaces with and without spatial reuse. distributions among the hosts and disks have to be well balanced. For example, if the requests from all the hosts are to the same disk, the total throughput that can be achieved will be no more than the throughput of that disk. On the other hand, if all the requests come from a single host, the total achievable throughput will be limited by the bandwidth of the links connected to that host. In either case, the advantage of spatial reuse will be minimal. The load distribution will also be affected by how the disks are shared among the hosts. A host accessing all the disks on a loop will create different load on the links compared to the case that a host only accesses parts of the disks close to it. The ratio of read and write operations each host generates will also create different load on a loop. How does the ratio of reads and writes affect the scalability of the storage system design will need to be studied. Many well-known computer networks have the similar loop or ring topology. For example, Token Ring [11] also has the ring topology and uses a token to determine which node is allowed to transmit data on the loop. It requires a dedicated ring for each data transmission. This is similar the FC-AL which requires dedicated loop for every data transmission. Therefore, the achievable throughput is limited by the link bandwidth. Slotted Rings, Register Insertion Rings [11], MetaRing [12] and MetaNet [13] networks have similar spatial reuse feature. They differ in the way that they regulate data transmission on the loop. Although many studies have been done on these network protocols, the scalability of the serial storage interfaces based on the spatial reuse is not clear. Storage networks have different characteristics from the computer networks. Firstly, the trafc patterns are different. The storage devices are usually passive devices which do not initiate data transmission. The storage devices such as disk and tape drives only read or write data when there is a request from a host. Therefore, as opposed to the computer networks where each node may communicate with any other node, the trafc on the storage networks is mainly between a host and a storage device. The hosts and the disks have different impact on the performance of the storage system. It is not clear that how the scalability of the spatial reuse based serial storage will be affected by different number of hosts and disks on the loop. Secondly, reading from (or writing to) the physical media on storage devices usually has much longer latency time than from the memory on a computer. Its impact on the scalability of storage systems is not clear either. To understand the scalability of spatial reuse based serial storage interfaces, we derive an analytic model for the spatial reuse based serial storage interfaces. We use this model the study the performance and the scalability of such storage networks under different loads and trafc scenarios. The remainder of this paper is organized as the following. In Section 2, we describe an analytic model for serial storage interfaces with spatial reuse. We cover the scenarios of different read and write ratios, load distributions, and degree of disk sharing. In Section 3, we show the numerical results of the model with different congurations and load distributions. In Section 4, we conclude this paper and suggest some future work.
2 An Analytic Model of a Serial Storage Interface

In this section, we will show the analytic model that we derived for the serial storage interfaces with spatial reuse. We will start analyzing the trafc going through the links with a trafc scenario in which all the requests are reads. In Section 2.1, we will show a detail analysis how we derive the model for this scenario. In Section 2.2, we will include several extensions for more general trafc load. Trafc scenario with mixed read and write will be discussed in this subsection. Different disk sharing and disk load distribution are the other two trafc scenarios we include in our analytic model. 2.1 Conguration and Assumptions We assume there are Nh hosts and Nd storage devices. In our analytic model, we assume each node (either a device or a host)
Host A
Host B
a1
a2
b1
b2
(a): System with two hosts connected right next to each other will not effectively use either link a2 or b1.
20MB/sec Host A Host B
a1
a2
b1
b2
(b): System with only one disk connected in between two hosts will also has low effectiveness on link a2 and b1.
Figure 2: Two examples show that the bandwidth may not be efciently used. have two input links and two output links. The devices and hosts are connected to form a loop with every node having an input and output link to its neighbor node directly. The resulted loop provides two paths between each pair of source and destination nodes. In this paper, we assume the shortest path (in terms of the number of hops) is chosen for each communication. The destination node is responsible to absorb the trafc on the loop. That means, trafc only traverses between the source and the destination node. We assume Nd storage devices are evenly distributed among Nd Nh hosts. That is, there are Nh storage devices between any two neighboring hosts. We assume each link is operated independently. That is, each node can send and receive data to its neighbors regardless of the other links status. Therefore, multiple concurrent transmissions can occur simultaneously. We assume there is a mechanism to ensure that all the connections (communication between a source and a destination node) competing for the same link get the equal share of the link bandwidth. In the following analysis, we assume the link bandwidth BL is the effective bandwidth for data. That means none of the control trafc such as sending read and write request commands, frame header, and the acknowledgment and ow control trafc if applicable, will consume any part of the effective link bandwidth BL . In the remaining of this paper, we will use link bandwidth as the effective link bandwidth for simplicity. We assume a high load scenario and the load among the disks and hosts are uniformly distributed. Therefore, not only each host generates the same load to each storage device, but also each storage device is evenly shared by all the hosts. If we denote the throughput that a disk can generate in a high load scenario is Bd , B each host will get Nd throughput from each storage device. In this h paper, we will only consider traditional read and write requests. We do not consider any third party command such as Copy commands when analyzing the achievable throughput. 2.2 Pure Read In this subsection, we will analyze the trafc and the achievable throughput with pure read trafc scenario. Each host is assumed generating even load on all the disks. For simplicity, we will only discuss the trafc on one direction. The analysis of the trafc on the other direction is the same because of the symmetric conguration and load distribution and link independence. Figure 3 shows a general loop conguration with only the counter clockwise direction links. For the pure read trafc scenario, the link connecting a host will have higher load than the other links connecting two storage devices with our uniform load assumption. Therefore we rst analyze the trafc going through Link a (the input link to Host H) in Figure 3. We categorize the trafc going through Link a into two categories. One category is for those trafc read by Host H. This trafc will not be forwarded after they arrive at Host H. It contributes to the total throughput of Host H through Link a. We call this trafc category the Effective Trafc (to Host H). The other trafcs going through Link a are those data read by other hosts. They will be forwarded to Link b after arriving from Link a. We call this trafc the Cross Trafc (to Host H). The Effective Trafc (of H) consists of data read from Nd disks 2 by Host H. (The data read from the other Nd disks will go through 2 the clockwise direction.) Based on our uniform load distribution 1 assumptions we mentioned before, each disk will generate Nh of
its throughput to each host. Therefore, each one of those Nd disks 2 B will generate Nd to the Effective Trafc of Host H. The total bandh width requirement for the Effective Trafc of Host H is therefore equal to T raf f icEffective which is dened in (1).
T raf f ic
Nd For the Cross Trafc, there will be Nd Nh disks to the left of 2 Host H (in Figure 3) sending data through Link a to the Host (H+1)
Bd Effective = Nd Nh 2
(1)
Bd Nh Nd Nh disks Nd Nh disks Nd Nh disks Bd Nh Nd Nh disks
Host
... ...
Host
...
...
Host H-2
... ...
Host H-1
... ... . . . ...
Host H b
...
Host H+1
...
Host H+2
... ...
Host
Traffic from Nd disks 2 Nd Bd Effective traffic to Link a= 2 Nh
Traffic from Nd - Nd 2 Nh
disks to Host (H+1)
...
Traffic from Nd - Nd * 2 disks to Host (H+2) 2 Nh
...
. . .
(Nh 2
Nd Bd 1) Nh Nh
Traffic from Nd disks Nh 1 Nh Cross Traffic to link a = 2 2
Figure 3: An analytic model for serial storage interfaces with spatial reuse
2Nd disks to which is the rst host to the right of H. Also Nd 2 Nh the left of Host H will send data to Host (H+2) through Link a. i Nd disks to the left of Host H will send data Similarly, Nd 2 Nh to Host (H+i) through Link a. The total bandwidth requirement of these Cross Trafc is equal to T raf f icCross as dened in (2). (Without losing the generosity, we assume Nh is an even number for the rest of this section.)
Host H with both direction links (counter clockwise and clockwise) is equal to two times of the throughput of H through Link a. And the total throughput of the system will then be equal to the throughput of H multiplied by the number of hosts, Nh . We denote the total throughput of the whole system by T raf f icAggregate which can be expressed in (4).
T hroughput
T raf f ic
Cross =
=
h PiN=1?1 Nd N
2
1 2
h Nh ( Nh
2 2
Bd Nh )
? 1)
Nd Nh
Bd Nh .
Aggregate = Bd Nd subject to Nd Bd + 1 Nd ( Nh ? 1) Nh Bd 2 Nh 2 Nh 2 2 Nh
(4)
(2)
When the total bandwidth requirement for Effective Trafc and Cross Trafc through Link a is smaller than the bandwidth of Link a (which is BL ) the throughput of H through Link a will be the same as T raf f icEffective . Otherwise, the Effective and Cross Trafc will be bounded by the link bandwidth. In such a case, the device will not be allowed to send as much data rate as they could without this constraint. As a result, the disk throughput will be reduced such that the total of the Effective and Cross Trafc are equal to the link bandwidth BL . The throughput of Host H through Link a will be the resulting disk throughput Bd divided by Nh and multiplied by Nd The throughput of Host H through Link a thus 2 can be expressed in (3).
T hroughput
N B H (a) = 2d Nd subject to h Nd Bd + 1 Nh ( Nh ? 1) Nd 2 Nh 2 2 2 Nh
Bd Nh
L (3)
Since the load and trafc is symmetric and the trafc on both directions are assumed to be independent, the total throughput of
Note that the rst part (Bd Nd ) of (4) is obvious because the achievable throughput is limited by the total available disk bandwidth in any case. The constraint part (the inequality) in (4) is more important and interesting because it tells us when the left hand side of the inequality (with the original disk bandwidth before being constrained) is greater than BL , the links will become the bottleneck. In such a case, the disk bandwidth will be constrained by that inequality. As a result, the disk will be under-utilized. Some part of the disk bandwidth will be wasted and the total achievable throughput will be reduced as well. Other the other hand, if the left hand side of the inequality is less than BL (with the original disk bandwidth), that means the links provide more than enough bandwidth for all the Nd disks on the loop. One of the possible reasons that may cause this under utilization is that the number of disks is not big enough. In this case, we could add more disks on the loop to fully utilize the link bandwidth. Another possible reason is that the request size is too small to have enough effective disk throughput. The analysis for pure write is very similar to that for pure read. The only difference is the direction of data ow. For the read, the data come from each storage device to different host while for the write, it goes in the opposite direction from the host to the storage. Therefore the denition of the Effective Trafc in pure write for
Host H through Link b need to be changed as the trafc going out from Host H to the storage through Link b. The analysis we did for the pure read on Link a can be applied to the analysis of the pure write on Link b. As a result, the total achievable throughput and constraint are exactly the same as that for the pure read dened in (4). 2.3 Other Trafc Scenario Variations In this subsection, we will extend our analysis to cover three other trafc patterns. In the rst scenario, we cover the case that both read and write requests coexist with a certain percentage for each. We use this scenario to demonstrate how the analytical result can be applied with mixed em Read and Write. In the second and the third scenario, we will focus on the pure Read case for simplicity. In the second scenario, we consider the case when each host only reads data from some of the disks on the loop. In the third scenario, we consider a non-uniform load among disks from each host. The different between the second and the third scenario is that in the second scenario, only some of the disks will be accessed by each host. The access load among those disks by each host is however, uniform. In the third scenario, each host may generate a different load to each disk; that is the trafc applies a non-uniform load among disks from each host. Mixed Read and Write The analysis for pure write is very similar to that for pure read. The only difference is the direction of data ow. For the read, the data come from each storage device to different host while for the write, it goes in the opposite direction from the host to the storage. Therefore the denition of the Effective Trafc in pure write for Host H through Link b need to be changed as the trafc going out from Host H to the storage through Link b. The analysis we did for the pure read on Link a can be applied to the analysis of the pure write on Link b. As a result, the total achievable throughput and constraint are exactly the same as that for the pure read dened in (4). For the mixed read and write, we dene the Effective Trafc to be the Effective Trafc of read plus Effective Trafc of write. We dene the read-to-write ratio, Rrw , to be the ratio of the number of read commands over the number of write commands. Rrw can Pr be also expressed as Pw , where Pr and Pw are the percentage of read and write commands, respectively. We assume each request is of the same size and for simplicity, we also assume read and write have the same throughput from each device. Therefore, PrPrPw + and PrPw w of the device throughput will be for read and write +P respectively. With a similar analysis as in the pure read and pure write, we can derive the Effective Trafc of Host H through Link a (for read) denoted by T raf f icEffective (read; Pr ; Pw ) which is dened in (5). The Cross Trafc through Link a for the read is equal to T raf f icCross (read; Pr ; Pw ) as dened in (6).
T raf f ic
T raf f ic
Cross (write; Pr ; Pw ) = PrPw w 1 +P 2 N N Bd . ( 2h ? 1) Nd h Nh
Nh
2
(8)
The total achievable throughput is equal to two (both directions) times Nh multiplied by the sum of TrafcEffective (read; Pr ; Pw ) and TrafcEffective (write; Pr ; Pw ). Note that the Cross Trafc for read and write will go through both Link a and b. Therefore, the total Cross Trafc on either Link a or b is equal to T raf f icCross (read; Pr ; Pw ) + T raf f icCross (write; Pr ; Pw ). After considering the bandwidth requirement for both Effective Trafc and Cross Trafc and the link bandwidth BL , the aggregate achievable throughput can be expressed in (9).
T hroughput
Nd Bd subject to 2 Nh Nd Bd max(Pr ;Pw ) + 1 Nd ( Nh 2 Nh Pr +Pw 2 Nh 2 H

=
Bd ? 1) N2h Nh
(9)
The constraint part in (9) is almost the same as that in (4) except for the factor max(Pr ;Pw ) which arises because the percentage of Pr +Pw Read and Write affect the distribution of the trafc on the links. Different Disk Sharing Percentage We dene the Disk Sharing Percentage RDS as the number of disks accessed by each host over the number of disks on a loop. For example, RDS = 1 (or 100%) means each host will access all the disks. RDS = 0:5 (or 50%) means each host will only access half of the disks on the loop. We assume for those cases when only a part of the disks on a loop are accessed by a host, those disks closer (in terms of number of hops) to a host will be accessed (See Figure 4). Note that in the case that each host accesses RDS of the total disks on a loop, each disk will be accessed by RDS of the hosts on the loop. We can apply the similar analysis as we did for the pure read. Since each disk is accessed (or shared) by RDS Nh hosts, it B will generate RDSdNh to each host. The number of disks a host reads data from is RDS Nh . Therefore, the Effective Trafc re2 mains the same as (1) regardless of the value of RDS . The Cross Trafc is less than that in pure read. It is equal to 1 ( RDS Nh 2 2 R N N B 1) DS h N d R d . The achievable throughput with degree of 2 h DS Nh disks sharing equal to RDS can be expressed as (10).
T hroughput
Aggregate (RDS ) = Bd Nd subject to Nd Bd + 1 Nd ( RDS Nh ? 1) Nh Bd BL 2 Nh 2 Nh 2 2 Nh
(10)
Bd Effective (read; Pr ; Pw ) = PrPrPw Nd Nh + 2 Pr 1 Nh T raf f icCross (read; Pr ; Pw ) = Pr +Pw 2 2 N N Bd . ( 2h ? 1) N d h Nh
(5)
The only difference between (10) and (4) is the factor RDS . When RDS equals to one, (10) will be exactly the same as (4). When RDS decreases, the left hand side of the inequality also decreases which means the inequality is more likely to hold. That is, more disks may be added to the loop before saturating it (with the same disk bandwidth). A Non-uniform Load Distribution among Disks from Each Host
(6)
The Effective and Cross Trafc for write through Link b can be expressed in (7) and (8) respectively.
T raf f ic
Bd Effective (write; Pr ; Pw ) = PrPw w Nd Nh +P 2
(7)
In the following paragraphs, we will consider a non-uniform load among the disks from each host. We will study the achievable throughput with different load distribution. We rst partition the disks on a loop into two groups for each host. The rst group is called local disks and the second group is called remote disks. The local disks of a Host H consist of all the disks whose shortest path between the Host H and the disk does not
RDS Nd disks 2 RDS Nd disks 2
RDS Nd disks 2
RDS Nd disks 2
RDS Nd disks 2
RDS Nd disks 2
Figure 4: Each host accesses RDS ND disks closest to it with include any other host as an intermediate node. The remote disks of H consist of all the other disks which are not local disks of H. By the similar concept, we call a host a local host to a disk if this disk is a local disk to that host. Otherwise, we call that host a remote host to the disk. Note that the local and remote disks are associated with each host. Therefore, each disk could be a local disk to some hosts and a remote disk to the other hosts. We dene the Ratio of Disk Load, RDL , to be the number of requests generated by each host to each of its remote disks over that to each of its local disks. For each disk, it has 2 local hosts and Nh 2 remote hosts. Since the ratio from each local and each 1 remote host is RDL , each disk will generate 2+(Nh ?2)RDL and
RDS ND disks on each side. 2
h 2remote hosts, respectively. Following the similar analysis described in the mixed read and write case, we can express the achievable throughput with a non-uniform load distribution dened by RDL in (11) with read commands only.
T hroughput
RDL 2+(Nh ?2)RDL of its throughput to each of its two local hosts and
ing the analytic model we derived in Section 2. We will show the results with three different trafc and load scenarios. The rst case is pure read. We will show how many disks are needed to saturate a loop with pure read. How much the maximum achievable throughput will be when a loop is saturated will be also discussed. In the second scenario, we will show the results of achievable throughput with two different values of Rrw , when Rrw = 1 (50% read and write) and Rrw = 2 ( 2 read and 1 write). In the third sce3 3 nario, we will apply different loads between local disks and remote disks from each host. We will show how the achievable throughput changes as this load distribution changes. In this section, we assume the link bandwidth BL is 20 MB/sec. We also use Table 1 for the disk throughput with different request sizes. Table 1: Request size versus Disk throughput (IBM 4.5GB UltraStar SSA disk [10])
Request Size 4 KB 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB 1024 KB Disk Throughput 0.28 MB/sec 0.54 MB/sec 1.02 MB/sec 1.82 MB/sec 2.81 MB/sec 3.94 MB/sec 4.99 MB/sec 5.81 MB/sec 6.30 MB/sec
Nd B H (a)(RDL ) = Nh 2+(Nh ?d2)RDL + N R B N ( 2d ? N d ) 2+(NDL2)d h h ? RDL subject to Nd Bd RDL Bd Nd Nd Nh 2+(Nh ?2)RDL + ( 2 ? Nh ) 2+(Nh ?2)RDL + Nh Nd Bd RDL 1 Nh 2 ( 2 ? 1) 2 Nh 2+(Nh ?2)RDL BL
(11)
Figure 5 shows an example of non-uniform load distribution with six hosts. In this example, the Host H3 and H4 are the two local hosts to Disk D3. All the other hosts are the remote hosts to Disk D3. Therefore, each of H3 and H4 will get 2+41 DS of the R
R disk bandwidth Bd . Each of the other four hosts will get 2+4DS RDS of the disk bandwidth.
3 Numerical Results
The number of disks needed to saturate a loop (Pure read) In the following paragraphs, we will show and discuss the number of disks needed to saturate a loop with pure read. We applied 1 , and 1 to (10). Different request sizes from 4KB RDS = 1, 2 4 to 1MB and different conguration with the number of hosts from one to eight were considered. Figure 6 shows the minimum number of disks it needs to saturate a loop. The results show that in a single host conguration and RDS = 1 , it needs at least 144 4
In this section, we will discuss and show the performance characteristics of a serial storage interface with spatial reuse by apply-
Bd 2+4RDL H1 D1 H2 D2 H3 RDLBd 2+4RDL RDLBd 2+4RDL D3
Bd 2+4RDL D4 H4 RDLBd 2+4RDL RDLBd 2+4RDL H5 D5 H6 D6
Figure 5: An example of non-uniform load distribution with six hosts. disks to saturate a loop with 4KB requests. When the number of hosts increases to eight, the number of disks needed also increases eight times (to 1152). The achievable throughput increases from 40 MB/sec to 320MB/sec. (See Figure 7). This indicates the scalability of the achievable throughput is linear to the number of hosts in a loop in this case. This is because the RDS = 1 we used in this 4 case results in no Cross Trafc since the number of host is less than or equal to eight. The results also indicate that under this load and conguration, it may need more than one thousand disks to saturate a loop with eight hosts. As RDS increases to one, which means each host will access all the disks on a loop, the maximum achievable throughput with eight hosts is reduced to 128 MB/sec (from 320MB/sec with RDS = 1 ) when at least 461 disks are used. This 4 is because the Cross Trafc increase with larger RDS . This also indicates that the scalability of the achievable throughput is poor as the number of hosts increase with RDS = 1. With either value of RDS , it indicates that a serial storage interface needs to have a large address space to handle enough storage devices to saturate a loop with eight hosts and 4KB read. The number of disks needed to saturate a loop (Mixed read and write) Next, we will show the achievable throughput for two different read and write ratio, Pr:Pw=1:1 and Pr:Pw=2:1. Figure 8(a) shows the achievable through with Pr:Pw=1:1. Figure 8(b) shows the results for Pr:Pw=2:1. Congurations with up to eight hosts were considered. The result shows that it achieves the maximum throughput with Pr:Pw=1:1. For the case Pr:Pw=1:1, it achieves 80MB/sec with a single host. For the case Pr:Pw=2:1, it only achieves 60MB/sec with the same conguration. As in the pure read case, the achievable throughput scales up linearly as the number of hosts increases up to eight with RDS = 0:25. With RDS = 1, the same overhead by the Cross Trafc seen in Pure Read case also showed in the mixed read and write. Disk load distribution versus number of disks needed to saturate a loop In the previous two trafc scenarios, we have seen a lot of performance degradation on the achievable throughput caused by the Cross Trafc when every disk is shared by all the hosts ( RDS = 1). It indicates that RDS should be kept as low as possible to achieve higher throughput. However, the disk sharing has its advantage in reducing storage space capacity needs. Multiple hosts can share the same copy of data by disk sharing. Without disk sharing, each host may need to keep a copy in its local disks. With only one copy of data, data consistency would be easier to maintain compared with multiple copies since it eliminates the need to keep the consistency among all the data copies. We now apply a different load distributions between the local and remote disks by changing RDL . When RDL = 1, the load on each local and remote disk are the same. It becomes the uniformly load distribution we have discussed above. With RDL = 0, it is also a special case that each host only accesses its local disks. This is the same as the case when RDS = 0:25 and Nh = 8 using uniform load distribution. We will show the achievable throughput versus the number of hosts on a loop with different RDL in pure read scenario. Figure 9 shows the achievable throughput with RDL equal to 0.2, 0.4, 0.6, 0.8 and 1. The results show that when the number of hosts is eight, the throughput with RDL = 0:2 is still far below the throughput with RDL = 0. The achievable throughput is about 182MB/sec with RDL = 0:2 while it is 320 MB/sec with RDL = 0 (no disk sharing). The results again suggest that disk sharing should be kept minimum.
4 Conclusions
In this paper, we derived an analytic model for the serial storage interfaces with the spatial reuse feature. We investigated the maximum aggregate throughput from different system congurations and load distributions using the model. We show how many disks are needed to saturate a loop with different number of hosts and different load scenarios. We found the number of disks needed to saturate the loop may range from 50 to more than one thousand depending on the request size. Therefore, proper conguration is important for different applications in order to achieve high throughput. We also found that load balancing by uniformly distributing the load to all the disks on a loop may cause a lot of Cross Trafc. This Cross Trafc may reduce the achievable throughput by more than half in some cases. In serial storage interfaces, when a link breaks, the solution is to take the alternative route (since there are two paths between each pair of source and destination nodes). Since the cost of the Cross Trafc is high as we have shown in this paper, taking the alternative route may cause longer delay for all the existing I/O operations during high load. Therefore, for delay-sensitive applications, a portion of bandwidth may need to be reserved in case of a link failure to provide consistent performance. It has been shown in this paper that data sharing may result in a reduction on the achievable throughput. But it may be useful for some situations. For example, for those les which are not frequently accessed and need large storage space, disk sharing may provide a good solution. Thus, le allocation is important in serial
Number of Disks
Throughput (MB/sec)
Number of Disks needed to saturate a loop (R_DS=1, 20MB/sec link) 1200 4 KB req 8 KB req 16 KB req 1000 32 KB req 64 KB req 128 KB req 800 256 KB req 512 KB req 1024 KB req 600
Maximum Achievable Throughput vs Number of Hosts (20MB/sec link) 400 R_DS=1.00 R_DS=0.50 350 R_DS=0.25 300 250 200 150 100
400
200
0 0 2 4 6 8 Number of Hosts (a) 10 12
50 0
Number of Disks
Number of Disks needed to saturate a loop (R_DS=0.5, 20MB/sec link) 1200 4 KB req 8 KB req 16 KB req 1000 32 KB req 64 KB req 128 KB req 800 256 KB req 512 KB req 1024 KB req 600
6 8 Number of Hosts
10
12
Figure 7: Maximum achievable throughput when a loop is saturated. (Pure read)

[2] ANSI X3.294-1996 (SSA-S2P) Information Technology - Serial Storage Architecture - SCSI-2 Protocol (SSA-S2P) (Draft Proposed American National Standard. American National Standard Institute, Inc., July, 1996.
400
200
0 0 2 4 6 8 Number of Hosts (b) 10 12
Number of Disks
Number of Disks needed to saturate a loop (R_DS=0.25, 20MB/sec link) 1200 4 KB req 8 KB req 16 KB req 1000 32 KB req 64 KB req 128 KB req 800 256 KB req 512 KB req 1024 KB req 600
[3] David H.C. Du, Tai-Sheng Chang, Jenwei Hsieh, Yuewei Wang and Simon Shim. Emerging Serial Storage Interfaces: Serial Storage Architecture (SSA) and Fibre Channel - Arbitrated Loop (FC-AL), TR 96-073, Technical Report, Department of Computer Science, University of Minnesota [4] David H.C. Du, Jenwei Hsieh, Tai-Sheng Chang, Yuewei Wang and Simon Shim, Performance Study of Serial Storage Architecture (SSA) and Fibre Channel - Arbitrated Loop (FC-AL), to appear in IEEE Parallel and Distributed Technology [5] ANSI X3.272-199x, Fibre Channel - Aaron Proposal, Rev. .001, October 4, 1996. [6] Technical Editor John P. Scheible, Fibre Channel - TORN Proposal, a working document of T11, a Task Group of Accredited Standards Committee NCITS, Revision 6, July 23, 1997. ftp.symbios.com/pub/standards/io/x3t11/document.97/97w150r6.pdf
400
200
0 0 2 4 6 8 Number of Hosts (c) 10 12
[7] ANSI X3.269-199x, Fibre Channel Protocol for SCSI (Draft Proposed American National Standard. American National Standard Institute, Inc., May 30, 1995. [8] ANSI X3.272-199x, Fibre Channel - Arbitrated Loop (FC-AL), Revision 4.5, American National Standard Institute, Inc., June 1, 1995. [9] ANSI X3.272-199x, Fibre Channel - Arbitrated Loop (FC-AL-2), Revision 5.1, American National Standard Institute, Inc., March 26, 1996. [10] IBM Corporation, Functional Specication, Ultrastar XP Models, 1995. [11] Andrew S. Tanenbaum, Computer Networks, Prentice Hall. 3rd. Ed, 1996. [12] J. Chen, I. Cidon, and Y. Ofek, A Local Fairness Algorithm for the MetaRing, and Its Performance Study, IEEE Infocom, 1992, pp 1635-1641. [13] Y. Ofek and M. Yung, METANET: Principles of an Arbitrary Topology LAN, IEEE/ACM Transactions on Networking, Vol. 3 No. 2, April 1995, pp 169-180.
Figure 6: Number of disks needed to saturate the channel. (Pure read) (a):RDS = 1; (b):RDS = 0:5; (c):RDS = 0:25. storage interfaces. For those les which have dynamic access frequencies, a dynamical le migration to adopt the load change may be necessary.
Acknowledgment
The authors would like to thank Yuewei Wang and Jenwei Hsieh for their valuable suggestions and comments on this paper.
References
[1] ANSI X3.295-1996 (SSA-TL1) Information Technology - Serial Storage Architecture - Transport Layer 1 (SSA-TL1) (Draft Proposed American National Standard. American National Standard Institute, Inc., July, 1996.
Maximum achievable throughput vs number of hosts (Pr:Pw=1:1) 700 600 Throughput (MB/sec) 500 400 300 200 100 0 0 2 4 6 8 Number of Hosts (a) 10 12 R_DS=1.00 R_DS=0.50 R_DS=0.25
Maximum achievable throughput vs number of hosts (Pr:Pw=2:1) 700 600 Throughput (MB/sec) 500 400 300 200 100 0 0 2 4 6 8 Number of Hosts (b) 10 12 R_DS=1.00 R_DS=0.50 R_DS=0.25
Figure 8: The achievable throughput with different read to write ratios (a):Pr:Pw=1:1; (b):Pr:Pw=2:1
Maximum achievable throughput vs number of hosts (20MB/sec link) 200 R_DL=0.2 R_DL=0.4 R_DL=0.6 R_DL=0.8 150 R_DL=1.0 Throughput (MB/sec)
100
50
0 0 2 4 6 8 Number of Hosts 10 12
Figure 9: The achievable throughput with different RDL .

The Scalability of Spatial Reuse Based Serial Storage Interfaces

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

The Scalability of Spatial Reuse Based Serial Storage Interfaces

Hochgeladen von

Copyright:

Verfügbare Formate

The Scalability of Spatial Reuse Based Serial Storage Interfaces

backup to tape 40MB/sec

2 An Analytic Model of a Serial Storage Interface

Bd Nh Nd Nh disks Nd Nh disks Nd Nh disks Bd Nh Nd Nh disks

... ... . . . ...

Traffic from Nd disks 2 Nd Bd Effective traffic to Link a= 2 Nh

disks to Host (H+1)

Traffic from Nd disks Nh 1 Nh Cross Traffic to link a = 2 2

Cross (write; Pr ; Pw ) = PrPw w 1 +P 2 N N Bd . ( 2h ? 1) Nd h Nh

Nd Bd subject to 2 Nh Nd Bd max(Pr ;Pw ) + 1 Nd ( Nh 2 Nh Pr +Pw 2 Nh 2 H

Aggregate (RDS ) = Bd Nd subject to Nd Bd + 1 Nd ( RDS Nh ? 1) Nh Bd BL 2 Nh 2 Nh 2 2 Nh

Bd Effective (read; Pr ; Pw ) = PrPrPw Nd Nh + 2 Pr 1 Nh T raf f icCross (read; Pr ; Pw ) = Pr +Pw 2 2 N N Bd . ( 2h ? 1) N d h Nh

Bd Effective (write; Pr ; Pw ) = PrPw w Nd Nh +P 2

RDS Nd disks 2 RDS Nd disks 2

RDS ND disks on each side. 2

Bd 2+4RDL H1 D1 H2 D2 H3 RDLBd 2+4RDL RDLBd 2+4RDL D3

Bd 2+4RDL D4 H4 RDLBd 2+4RDL RDLBd 2+4RDL H5 D5 H6 D6

0 0 2 4 6 8 Number of Hosts (a) 10 12

Figure 7: Maximum achievable throughput when a loop is saturated. (Pure read)

0 0 2 4 6 8 Number of Hosts (b) 10 12

0 0 2 4 6 8 Number of Hosts (c) 10 12

Figure 9: The achievable throughput with different RDL .

Das könnte Ihnen auch gefallen