Sie sind auf Seite 1von 245
Parallel Programming in OpenMP Rohit Chandra Leonardo Dagum Dave Kohr Dror Maydan Jeff McDonald Ramesh Menon About the Authors Rohit Chandra is a chief scientist at NARUS, Inc., a provider of internet business infrastructure solutions. He previously was a principal engineer in the Compiler Group at Silicon Graphics, where he helped design and implement OpenMP. Leonardo Dagum works jor Silicon Graphies in the Linux Server Platform Group, where he is responsible for the [/O infrastructure in SGI's scalable Linux server systems. He helped define the OpenMP Fortran API. His research interests include parallel algorithms and performance modeling for parallel systems Dave Kohr is a member of the technical staff at NARUS, Inc. He previ- ously was a member of the technical staff in the Compiler Group at Silicon Graphics, where he helped define and implement the OpenMP. Dror Maydan is director of software at Tensilica, Inc., a provider of appli cation-specific processor technology. He previously was an engineering department manager in the Compiler Group of Silicon Graphics, where he helped design and implement OpenMP. Jeff McDonald owns SolidFX, a private software development company. As the engineering department manager at Silicon Graphics, he proposed the OpenMP API effort and helped develop it into the industry standard it is today. Ramesh Menon is a staff engineer at NARUS, Ine, Prior to NARUS, Ramesh was a staff engineer at SGI, representing SGI in the OpenMP forum, He was the founding chairman of the OpenMP Architecture Review Board (ARB) and supervised the writing of the first OpenMP specifica- tions. Foreword by John L. Hennessy President, Stantord University FOR A NUMBER OF YEARS, | have believed that advances in software, rather than hardware, held the key to making parallel computing more commonplace. In particular, the lack of a broadly supported standard for programming shared-memory multiprocessors has been a chasm both for users and for software vendors interested in porting their software to these multiprocessors. OpenMP represents the first vendor-independent, commercial “bridge” across this chasm. Such a bridge is critical to achieve portability across different shared- memory multiprocessors. In the parallel programming world, the chal lenge is to obtain both this functional portability as well as performance portability. By performance portability, | mean the ability to have reason- able expectations about how parallel applications will perform on different multiprocessor architectures. OpenMP makes important strides in enhanc- ing performance portability among shared-memory architectures. Parallel computing is attractive because it offers users the potential of higher performance. The central problem in parallel computing for nearly 20 years has been to improve the “gain to pain ratio.” Improving this ratio, with either hardware or software, means making the gains in performance come at less pain to the programmer! Shared-memory multiprocessing was developed with this goal in mind. It provides a familiar programming model, allows parallel applications to be developed incrementally, and vii viii Foreword supports fine-grain communication in a very cost effective manner. All of these factors make it easier to achieve high performance on parallel machines. More recently, the development of cache-coherent distributed shared memory has provided a method for scaling shared-memory archi- tectures to larger numbers of processors. In many ways, this development removed the hardware barrier to scalable, shared-memory multiprocess- ing. OpenMP represents the important step of providing a software stan- dard for these shared-memory multiprocessors. Our goal now must be to Jearn how to program these machines effectively (i.e., with a high value for gain/pain). This book will help users accomplish this important goal. By focusing its attention on how to use OpenMP, rather than on defining the standard, the authors have made a significant contribution to the important task of mastering the programming of multiprocessors. Contents Foreward, by John L. Hennessy Preface Chaplet | = Introduction Performance with OpenMP A First Glimpse of OpenMP The OpenMP Parallel Computer Why OpenMP? History of OpenMP Navigating the Rest of the Book Chaplet? == Getting Started with OpenMP Introduction OpenMP from 10,000 Meters OpenMP Compiler Directives or Pragmas Parallel Control Structures Communication and Data Environment Synchronization Parallelizing a Simple Loop Runtime Execution Model of an OpenMP Program Communication and Data Scoping 6 9 13 4 15 15 16 7 20 20 22 23 24 25 Contents Synchronization in the Simple Loop Example 27 Final Words on the Simple Loop Example 28 ! A More Complicated Loop 29 Explicit Synchronization 32 The reduction Clause 35 Expressing Parallelism with Parallel Regions 36 Concluding Remarks 39 Exercises: 40 Chaplet}. = Exploiting Loop-Level Parallelism 4 Introduction 41 Form and Usage of the parallel do Directive 42 Clauses 43 Restrictions on Parallel Loops 44 Meaning of the parallel do Directive 46 Loop Nests and Parallelism 46 trolling Data Sharing 7 General Properties of Data Scope Clauses 49 The shared Clause 50 The private Clause 51 Default Variable Scopes 53 Changing Default Scoping Rules 56 Parallelizing Reduction Operations 59 Private Variable Initialization and Finalization 63 Removing Data Dependences 65 Why Data Dependences Are a Problem 66 The First Step: Detection 67 The Second Step: Classification 71 The Third Step: Removal el Summary 81 Enhancing Performance 82 Ensuring Sufficient Work 82 Scheduling Loops to Balance the Load 85 Static and Dynamic Scheduling 86 Scheduling Options 86 Comparison of Runtime Scheduling Behavior 88 «! Concluding Remarks 90 Exercises 90 Contents xi Chapter 4... Beyond Loop-Level Parallelism: Parallel Regions 93 4) Introduction 93 }) Form and Usage of the parallel Directive 94 Clauses on the parallel Directive 95 12) Restrictions on the parallel Directive 96 | Meaning of the parallel Directive 97 | Parallel Regions and SPMD-Style Parallelism 100 ‘ threadprivate Variables and the copyin Clause 100 The threadprivate Directive 103 The copyin Clause 106 1), Work-Sharing in Parallel Regions 108 A Parallel Task Queue 108 Dividing Work Based on Thread Number 109 28) Work-Sharing Constructs in OpenMP mw ‘4, Restrictions on Work-Sharing Constructs 119 i) Block Structure 119 Entry and Exit 120 Nesting of Work-Sharing Constructs 122 1) Orphaning of Work-Sharing Constructs 123 Data Scoping of Orphaned Constructs 125 Writing Code with Orphaned Work-Sharing Constructs 126 i} Nested Parallel Regions 126 i!) Directive Nesting and Binding 129 {4 Controlling Parallelism in an OpenMP Program 130 ‘i Dynamically Disabling the parallel Directives 130 46) Controlling the Number of Threads 131 Dynamic Threads 133 4. Runtime Library Calls and Environment Variables 135 4) Concluding Remarks 137 1) Exercises 138 Chaptet§. Synchronization 141 it Introduction 141 ) Data Conflicts and the Need for Synchronization 142 5 Getting Rid of Data Races 143 xii Examples of Acceptable Data Races 1) Synchronization Mechanisms in OpenMP 52 Mutual Exclusion Synchronization The Critical Section Directive The atomic Directive | Runtime Library Lock Routines 4 Event Synchronization (4 Barriers 542 Ordered Sections 1 The master Directive ‘4 Custom Synchronizatioi The flush Directive Rolling Your Own Some Practical Considerations i} Concluding Remarks i!) Exercises Chaplet = Performance Introduction i) Key Factors That Impact Performance Coverage and Granularity Load Balance Locality “8 Synchronization Performance-Tuning Methodology Dynamic Threads Bus-Based and NUMA Machines Coneluding Remarks 5] Exercises Appendix A A Quick Reference to OpenMP References Index Contents 144 146 147 147 152 155 157 157 159 161 162 163 165 168 168 171 171 173 173 175 179 192 198 201 204 207 207 211 pals 221 Preface OPENMP 15 A PARALLEL PROGRAMMING MODEL for shared memory and distributed shared memory multiprocessors. Pioneered by SGI and developed in collaboration with other parallel computer vendors, OpenMP is fast becoming the de facto standard for parallelizing applica- tions. There is an independent OpenMP organization today with most of the major computer manufacturers on its board, including Compaq, Hewlett-Packard, Intel, IBM, Kuck & Associates (KAI), SGI, Sun, and the U.S. Department of Energy ASCI Program. The OpenMP effort has also been endorsed by over 15 software vendors and application developers, reflecting the broad industry support for the OpenMP standard. Unfortunately, the main information available about OpenMP is the OpenMP specification (available from the OpenMP Web site at www, openmp.org). Although this is appropriate as a formal and complete speci- fication, it is not a very accessible format for programmers wishing to use OpenMP for developing parallel applications. This book tries to fulfill the needs of these programmers This introductory-level book is primarily designed for application developers interested in enhancing the performance of their applications by utilizing multiple processors. The book emphasizes practical concepts and tries to address the concerns of real application developers. Little background is assumed of the reader other than single-processor program- ming experience and the ability to follow simple program examples in the xi

Das könnte Ihnen auch gefallen