Beruflich Dokumente
Kultur Dokumente
Yong Yan
Profit-effective parallel
HAL Computer Systems Inc.
computing
Xiaodong Zhang
College of William & Mary
April–June 1999 65
Cost, cost(m, t), represents the development and maintenance THE RELATIONSHIP BETWEEN COST-EFFECTIVE AND
cost of a system with m processors in a t time period. PROFIT-EFFECTIVE PARALLEL COMPUTING
The production function, Pro(P(m), t), measures the profit In our analysis, we normalize the sequential system’s perfor-
(dollars) gained from P(m) during t, for m > 1. In practice, a mance as unitary performance, P(1) = 1, and the system’s cost
production function might exhibit different relationships to a in unitary time as unitary cost, cost(1, Lf) = 1. Under such con-
computer’s performance, which we examine in the context of ditions, P(m) is equivalent to the speedup, speedup(m); the cost
economics.8 We use three common production functions— of parallel computing, cost(m, Lf), is the costup of parallel com-
linear, superlinear, and sublinear—which measure the parallel puting, costup(m). Furthermore, Pro(1, 1) and Pro(P(m), 1) are
system’s production increase, compared to a sequential sys- the production function of sequential computing and parallel
tem. The relationship of Pro(P(m), t) to P(m) is computing, in unitary time.
Substituting P(1) = 1, cost(1, Lf) = 1, P(m) = speedup(m), and
• linearly proportional if Pro(P(m), t) = Pro(P(1), t) × P(m), cost(m, Lf) = costup(m) into Equation 2, we obtain this relation-
• superlinearly proportional if Pro(P(m), t) > Pro(P(1), t) × ship among profitup, speedup, and costup:
P(m), and
• sublinearly proportional if Pro(P(m), t) < Pro(P(1), t) ×
profitup(m ) =
( )
Pro speedup(m ),1 − costup(m )
.
P(m). Pro(1,1) − 1
(3)
Lifetime, Lf, is the interval between when someone starts If Pro(speedup(m), 1) is a linear function of the speedup; that is,
using a parallel system and when a sequential system with the
same order of performance goes on the market. This assumes Pro(speedup(m), 1) = Pro(1, 1) × speedup(m),
that a new state-of-the art parallel system always outperforms the profitup is
the fastest existing sequential system. This is also consistent
with the fact that sequential systems have been continuously Pro(1,1) × speedup(m ) − costup(m )
profitup(m ) = .
improved to perform as well as existing parallel systems. Also, Pro(1,1) − 1
if a parallel system can be upgraded timely, its lifetime (theo- (4)
retically) is infinite. In this case, parallel computing only needs With speedup(m) > costup(m) for cost-effectiveness in Equation
to be evaluated in unitary time—that is, Lf = 1—and the cost is 4, we have
Pro(1,1) × costup(m) − costup(m)
amortized over the lifetime. In the cost-effective model, the
evaluation time is also unitary time. profitup(m) >
Profit is a parallel system’s net profit: Pro(1,1) − 1
Profit(m) = Pro(P(m), Lf) Ð cost(m, Lf). = costup(m) > 1.
Similarly, a sequential system’s profit during Lf is So, we reach this conclusion:
Profit(1) = Pro(P(1), Lf) Ð cost(1, Lf). Conclusion 2. If the production from parallel computing is linearly
proportional to the computation’s speedup, and the computation is
In practice, the profit might be negative. To simplify discus- cost-effective, then the computation is also profit-effective.
sions, we assume the profit is positive, which means that
Pro(P(i), Lf) > cost(i, Lf), for i = 1, ..., m. For the same production function, is a profit-effective par-
We characterize a parallel system’s profit-effectiveness for allel computation (profitup(m) > 1) also cost-effective? To
applications by profitup, a ratio of the parallel system’s profit answer this question, we substitute profitup(m) > 1 into Equa-
to the sequential system’s profit: tion 4 to obtain
=
( ) ( )
Pro P (m ), L f − cost m, L f
. (2)
or
66 IEEE Concurrency
Because Pro(1, 1) > 1 and costup(m) > 1, Equation 6 does not Like Wood and Hill, we use the prices of Silicon Graphics’
necessarily satisfy the condition of cost-effectiveness, speedup(m) server products without including other costs, such as for soft-
> costup(m). So, we reach this conclusion: ware and application development. Computer system prices
are based on the basic prices of Challenge series products. In
Conclusion 3. If the production from parallel computing is lin- 1994, a uniprocessor Challenge DM cost approximately
early proportional to the computation’s speedup, a profit-effective $38,400 plus approximately $100 per Mbyte of memory. Thus,
parallel computation might not be cost-effective. the uniprocessor cost is
Conclusions 2 and 3 indicate that the cost-effective metric cost(1, s) = $38,400 + $100 × s,
is too pessimistic in judging if a parallel computation is finan-
cially justifiable for a linear production function. where s represents memory size in Mbytes. The price of a par-
By Equation 3, if parallel computing is profit-effective allel Challenge XL of m processors with s′ Mbytes of shared
(profitup(m) > 1), this condition should be satisfied: memory is
Pro(speedup(m), 1) > costup(m) + Pro(1, 1) Ð 1. (7) cost(m, s′) = $81,600 + $20,000 × m + $100 × s′.
If the production function is superlinear to the parallel com- Assuming that the shared-memory size is identical to that of a
puting’s speedup, and the computation is cost-effective uniprocessor workstation, we have s = s′. Therefore, the costup
(speedup(m) > costup(m) > 1), we have is
2.125 + 0.521 × m + 0.0026 × s .
Pro(speedup(m), 1) > Pro(1, 1) × speedup(m) > Pro(1, 1) × costup(m) costup(m, s ) = (8)
= costup(m)+(Pro(1,1) Ð1)costup(m) 1 + 0.0026 × s
> costup(m) + Pro(1, 1) Ð 1.
To calculate the profitup, we use Equation 3, where Pro(1,
Thus, Equation 7 is valid from the cost-effective point of view. 1) represents the production gained by a uniprocessor. Here,
So, we reach this conclusion: the production is normalized to the sequential computing cost,
and lifetime is normalized to the unit time. Pro(1, 1) is a value
Conclusion 4. If the production function from parallel computing is that closely depends on applications. In practice, Pro(1, 1) is a
superlinearly proportional to the computation’s speedup, and the com- constant for a class of applications. This constant is updated
putation is cost-effective, then the computation is also profit-effective. to a different value when the system is upgraded. In our study,
we assume that Pro(1, 1) is a constant.
If the production function is sublinear to the performance, the Regarding the parallel-system production, we consider the
following function choice is valid for speedup(m) > costup(m) > 1 production function
if we simply reverse the comparison sign in Equation 7:
Pro(speedup(m), 1) = fp × speedup(m) × Pro(1, 1),
Pro(speedup(m), 1) < costup(m) + Pro(1, 1) − 1.
where fp < 1, fp = 1, or fp > 1 represents a sublinear, a linear, or
Substituting this sublinear production function into Equation a superlinear production function of the speedup. Again, the
3, we obtain profit(m) < 1. So, we reach this conclusion: relationship between speedup(m) and m, the number of proces-
sors, is speedup(m) = fs × m, where fs ≤ 1. So, we characterize the
Conclusion 5. If the production function from parallel computing cost effect by this profitup formula:
is sublinearly proportional to the computation’s speedup, a cost-
f p × Pro(1,1) × m × f s − costup(m, s )
profitup(m, s ) =
effective computation might not be profit-effective. . (9)
Pro(1,1) − 1
We have shown that the production function is a major fac-
tor in determining if a parallel computation is financially jus- Memory is an important component of a system’s hardware.
tifiable from a cost-effective or profit-effective point of view. So, Table 1 shows the effect of memory cost as the memory
size increases, for an eight-processor system where the speedup
CASE STUDIES factor fs is 0.5 and Pro(1, 1) is 4. As the memory size increases,
We now demonstrate the difference between cost-effective costup decreases and profitup increases.
computing and profit-effective computing for varying proces- The table’s top section shows that when memory is more
sor and memory costs. For consistency and fairness, we use the than 500 Mbytes, the profitup is larger than 1 and the speedup
same vendor data on 1994 Silicon Graphics product prices that is larger than the costup. However, for 300 or 400 Mbytes, the
Wood and Hill used.4 Current market prices are different, but profitup is smaller than 1 even though the speedup is larger
that does not affect the comparison’s validity. than the costup. This shows that cost-effective parallel com-
April–June 1999 67
puting is not necessarily profit-effective. In this case, parallel eight to 1,024 processors and is both cost-effective and profit-
computing is profit-effective only when the speedup is suffi- effective for 64 to 1,024 processors.
ciently larger than the costup.
The table’s middle section shows that for 100 or 200 Mbytes,
the profitup is larger than 1, but the speedup is smaller than the
costup. This shows that profit-effective parallel computing is
not necessarily cost-effective. The table’s bottom section also
shows this result. In this case, parallel computing is profit-
B ecause high performance has strongly motivated parallel-
computing research and development for advanced appli-
cations, profit has not been a real concern. However, with
effective when the speedup is smaller than the costup. rapid advances in commodity processors and networking
Processors are another important component of hardware technology, and with rapidly changing global political and
cost because many scientific applications require a large num- economic structures, mainstream parallel computing platforms
ber of them to exploit parallelism. Table 2 shows the effect of are shifting from expensive, custom-designed massively
processor cost where the memory size is 512 Mbytes, the parallel processing machines to cheap, commodity-designed
speedup factor fs is 0.25, and Pro(1, 1) is 4. As the number of symmetric multiprocessors and networks of SMPs, work-
processors increases, the costup, speedup, and profitup stations, and PCs. Therefore, more and more users have been
increase. The table’s top section, where production is a sub- serious about profit gain from parallel computing.
linear function of speedup, shows that parallel computing is These trends and our work indicate that profit analysis is
not profit-effective when the speedup is larger than the costup necessary for evaluating parallel computing’s effectiveness.
for 64 processors and 128 processors, respectively. This shows Two major cost components that our case study did not quan-
that only a sufficiently large parallel system is likely to be profit- titatively consider are the lifetime software and hardware main-
effective for a sublinear production function. The table’s bot- tenance costs for a system, and the human cost to develop effi-
tom section, where production is a superlinear function of the cient parallel programs. In practice, the profit model should
speedup, shows that parallel computing is profit-effective for include these two application-dependent components. Also,
Table 1. The effect of memory cost. The number of processors (m) = 8, speedup(m) = 0.5m, and Pro(1, 1) = 4.
Costup and profitup are calculated by Equations 8 and 9.
Sublinear, fp = 0.4 Costup 5.2 4.5 4.0 3.6 3.3 3.1 2.9 2.7 2.6 2.5
Speedup 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0
Profitup 0.4 0.6 0.8 0.9 1.0 1.1 1.1 1.2 1.2 1.3
Linear, fp = 1.0 Costup 5.2 4.5 4.0 3.6 3.3 3.1 2.9 2.7 2.6 2.5
Speedup 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0
Profitup 3.6 3.8 4.0 4.1 4.2 4.3 4.4 4.4 4.5 4.5
Superlinear, fp = 1.2 Costup 5.2 4.5 3.9 3.6 3.3 3.1 2.9 2.7 2.6 2.5
Speedup 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0
Profitup 4.7 4.9 5.1 5.2 5.3 5.4 5.4 5.5 5.5 5.6
Table 2. The effect of processor cost. The memory size is 512 Mbytes, speedup(m) = 0.25m, and Pro(1, 1) = 4.
Costup and profitup are calculated by Equations 8 and 9.
Sublinear, fp = 0.25 Costup 1.9 2.4 3.3 5.1 8.6 15.8 30.0 58.6 115.8 230.1
Speedup 0.5 1.0 2.0 4.0 8.0 16.0 32.0 64.0 128.0 256.0
Profitup - 0.5 - 0.5 - 0.4 - 0.4 - 0.2 0.1 0.6 1.8 4.1 8.6
Superlinear, fp = 1.2 Costup 1.9 2.4 3.3 5.1 8.6 15.8 30.0 58.6 115.7 229.8
Speedup 0.5 1.0 2.0 4.0 8.0 16.0 32.0 64.0 128.0 256.0
Profitup 0.2 0.8 2.1 4.7 9.9 20.3 41.2 82.9 166.2 332.9
68 IEEE Concurrency
P U R P OSE The IEEE
Computer Society is the
world’s largest association of
computing professionals, and
is the leading provider of
technical information in the
field.
E X E C U T I V E C O M M I T E E
ACKNOWLEDGMENTS President: LEONARD L. TRIPP
We appreciate the constructive comments from the anonymous ref- Boeing Commercial Airplane Group
erees. Fred Preston at the NASA Langley Research Center and our P.O. Box 3707
colleague Zhao Zhang read the paper and made helpful comments. M/S 19-RF VP, Standards Activities:
Seattle, WA 98124
This work has been supported in part by the National Science Foun- STEVEN L. DIAMOND *
dation under grants CCR-9400719 and CCR-9812187, by the Sun VP, Technical Activities:
President-Elect: JAMES D. ISAAK *
Microsystems Computer Corporation under grant EDUE-NAFO- GUYLAINE M. POLLOCK *
Past President: Secretary:
980405, and by the Air Force Office of Scientific Research under grant DEBORAH K. SCHERRER*
DORIS L. CARVER *
AFOSR-95-1-0215. VP, Press Activities: Treasurer:
CARL K. CHANG †
MICHEL ISRAEL*
VP, Educational Activities: IEEE Division V Director:
JAMES H. CROSS II † MARIO R. BARBACCI †
VP, Conferences and Tutorials: IEEE Division VIII Director:
REFERENCES WILLIS K. KING (2ND VP) *
BARRY W. JOHNSON†
1. G.M. Amdahl, “Validity of the Single Processor Approach to Achiev- VP, Chapters Activities:
FRANCIS C.M. LAU* Executive Director &
ing Large Scale Computing Capabilities,” Proc. American Federa- VP, Publications: Chief Executive Officer:
tion of Information Processing Societies Conf., Thompson Books, BENJAMIN W. WAH (1ST VP)* T. MICHAEL ELLIOTT †
Washington, D.C., 1967, pp. 438–485.
2. J.E. Smith, “Characterizing Computer Performance with a Single *voting member of the Board of Governors †nonvoting member of the Board of Governors
Number,” Comm. ACM, Vol. 31, No. 10, Oct. 1988, pp. 1202–1206. B OARD OF GOVERNORS
3. X. Zhang, Y. Yan, and K. He, “Latency Metric: An Experimental Term Expiring 1999: Steven L. Diamond, Richard A. Eckhouse,
Gene F. Hoffnagle, Tadao Ichikawa, James D. Isaak, Karl Reed, Debo-
Method for Measuring and Evaluating Program and Architecture
rah K. Scherrer
Scalability,” J. Parallel and Distributed Computing, Vol. 22, No. 3, Term Expiring 2000: Fiorenza C. Albert-Howard, Paul L. Bor-
Sept. 1994, pp. 392–410. rill, Carl K. Chang, Deborah M. Cooper, James H. Cross, II, Ming T. Liu,
Christina M. Schober
4. D.A. Wood and M.D. Hill, “Cost-Effective Parallel Computing,” Com-
Term Expiring 2001: Kenneth R. Anderson, Wolfgang K. Giloi,
puter, Vol. 28, No. 2, Feb. 1995, pp. 69–72. Haruhisa Ichikawa, Lowell G. Johnson, David G. McKendry, Anneliese
von Mayrhauser, Thomas W. Williams
5. J.L. Hennessy and D.A. Patterson, Computer Architecture: A Quan-
titative Approach, 2nd ed., Morgan Kaufmann, San Francisco, 1996. Next Board Meeting: 7 June 1999, Richmond, Va.
6. B. Falsafi and D.A. Wood, “Cost/Performance of a Parallel Computer C OMPUTER SOCIETY OFFICES
Simulator,” Proc. Eighth Workshop on Parallel and Distributed Sim- Headquarters Office European Office
ulation, IEEE Computer Society Press, Los Alamitos, Calif., 1994, pp. 1730 Massachusetts Ave. NW, 13, Ave. de L’Aquilon
173–182. Washington, DC 20036-1992 B-1200 Brussels, Belgium
Phone: (202) 371-0101 Phone: 32 (2) 770-21-98
7. S.H. Fuller, “Price/Performance Comparison of C.mmp and the PDP- Fax: (202) 728-9614 Fax: 32 (2) 770-85-05
10,” Proc. Third Int’l Symp. Computer Architecture, ACM Press, New E-mail: hq.ofc@computer.org E-mail: euro.ofc@computer.org
York, 1976, pp. 195–202. Publications Office Asia/Pacific Office
10662 Los Vaqueros Cir., Watanabe Building
8. J. Stiglitz, Principles of Microeconomics, W.W. Norton & Company, PO Box 3014
Los Alamitos, CA 90720-1314 1-4-2 Minami-Aoyama,
New York, 1993. General Information: Minato-ku, Tokyo 107-0062,
Phone: (714) 821-8380 Japan
membership@computer.org Phone: 81 (3) 3408-3118
Yong Yan is a performance analyst responsible for the design and eval- Membership and Fax: 81 (3) 3408-3553
uation of multiprocessor systems at HAL Computer Systems Inc. He Publication Orders: (800) 272-6657 E-mail: tokyo.ofc@computer.org
has extensively published in the areas of parallel and distributed com- Fax: (714) 821-4641
E-mail: cs.books@computer.org
puting, computer architecture, performance evaluation, operating sys-
tems, and algorithm analysis. He received his BS and MS in computer E X E C U T I V E S T A F F
science from Huazhong University of Science and Technology, China, Executive Director & Chief Financial Officer:
and his PhD in computer science from the College of William & Mary. Chief Executive Officer: VIOLET S. DOAN
He is a member of the IEEE and ACM. Contact him at the Multi- T. MICHAEL ELLIOTT
Chief Information Officer:
processor Server Division, HAL Computer Systems Inc., Campbell, Publisher: ROBERT G. CARE
CA 95008; yyan@hal.com. MATTHEW S. LOEB
Manager, Research &
Director, Volunteer Services: Planning:
Xiaodong Zhang is a professor of computer science at the College of ANNE MARIE KELLY JOHN C. KEATON
William & Mary. His research interests are parallel and distributed
systems, computer-system performance evaluation, and scientific com- I E E E O F F I C E R S
puting. He is an associate editor of IEEE Transactions on Parallel and Dis- President: KENNETH R. LAKER
tributed Systems, and chairs the IEEE Computer Society Technical President-Elect: BRUCE A. EISENSTEIN
Committee on Supercomputing Applications. He received his BS in Executive Director: DANIEL J. SENESE
electrical engineering from Beijing Polytechnic University, China, and Secretary: MAURICE PAPO
his MS and PhD in computer science from the University of Colorado Treasurer: DAVID A. CONNOR
VP, Educational Activities: ARTHUR W. WINSTON
at Boulder. Contact him at the Dept. of Computer Science, College of VP, Publications Activities: LLOYD A. “PETE” MORLEY
William & Mary, Williamsburg, VA 23187-8795; zhang@ cs.wm.edu. VP, Regional Activities: DANIEL R. BENIGNI
VP, Standards Association: DONALD C. LOUGHRY
VP, Technical Activities: MICHAEL S. ADLER
President, IEEE-USA: PAUL J. KOSTEK
April–June 1999 5May1999