Sie sind auf Seite 1von 4

LLNL/LANL ASCI Level 3 Contract, 2001

FINAL REPORT

Dr. Allen D. Malony (PI), University of Oregon

The following reports the results from work on the ASCI Level 3 contract, Agreement No. B513235.
The original Statement of Work (SOW) is annotated to indicate the results status.

Instrumentation / Measurement
Goal: Integrate the TAU performance system with the dynamic instrumentation capabilities offered by
DyninstAPI. Enable TAU performance measurement on the Compaq Alpha Cluster. Improve
PDT program analysis system for Fortran 90 instrumentation.
Tasks
1. INSTR-1: Develop dynamic TAU performance measurement mechanisms for MPI using
DyninstAPI.
Status: Complete. We implemented a technique which spawns a Dyninst mutator with each MPI
generated executable image. The mutator inserts TAU instrumentation in the executable before
starting the MPI process and then waits for the child process to terminate. (This is similar to the
approach used in Dynaprof.) We demonstrated this capability with the SIMPLE hydrodynamics
benchmark in our PDPTA ’01 paper [1]. TAU v2.11 ships with support for DyninstAPI and MPI.
2. INSTR-2: Port the TAU performance measurement system to Compaq Alpha Cluster and
demonstrate with MPI applications.
Status: Complete. TAU supports Compaq (cxx, f90) and KAI (KCC, KAP/Pro) compilers under
Tru64. TAU also supports Compaq Linux clusters. This capability has been demonstrated with the
SAMRAI (Andy Wissinsk, LLNL) and SAGE (Jack Horner, LANL) projects.
3. INSTR-3: Complete PDT F90 implementation.
Status: Complete. TAU’s PDT system now supports F90 as well as C99 and C++. The PDT F90
front end has been validated on F90 test suites from the University of Colorado ELI project and the
PCRC HPF compiler project. A total of 309 programs were tested with no errors reported.
4. INSTR-4: Develop tool for automatic source-level F90 instrumentation and demonstrate on F90
application code.
Status: Complete. The PDT F90 capability has been used to build F90 instrumentation support for
TAU. We have tested the instrumentor partly on the SAGE code and the POP code (Phil Jones,
LANL), and more extensively in the Caltech CACR ASCI/ASAP VTF project (Julian Cummings).
In addition to its use in the TAU F90 instrumentor, the PDT F90 capability is being used in the
CHASM [2] project (Craig Rasmussen, LANL).

Allen D. Malony Page 1 5/20/2018


Remarks
 INSTR-1 is specifically for MPI only, not in conjunction with threads.
 The automatic F90 instrumentation tool in INSTR-4 depended on DUCTAPE extensions, which
were implemented by Bernd Mohr, ZAM/FZJ, Germany.

Unified Parallel Software (UPS)


Goal: Apply TAU’s capabilities for portable, multi-language, multi-threaded performance measurement
and multi-level software mapping of performance data to UPS.
Tasks
1. UPS-1: Work with UPS developers to integrate TAU performance system for instrumentation,
measurement, and analysis in the UPS programming environment. In particular, this includes:
 Generating TAU event traces for analysis and visualization using Vampir.
 Using PCL or PAPI for hardware performance profiling.
 Developing a wrapper instrumentation scheme for UPS and system libraries.
 Identifying user-defined events of interest and opportunities for event mapping
Status: Complete. TAU is available for use with the UPS system for both profile-based and trace-
based measurements with hardware performance monitoring capabilities. A significant
accomplishment to make TAU’s integration possible was the development of an automatic C
instrumentor. This allow full automatic instrumentation of UPS library source code.
2. UPS-2: Validate UPS/TAU performance measurement system on UPS-targeted ASCI platforms
using UPS validation benchmarks.
Status: Complete. TAU’s use with UPS on a UPS validation code was demonstrated to Richard
Barrett and Federico Bassetti at the LACSI Symposium 2001. Future work with Mike McKay at
LANL will be aimed towards a performance study of UPS using TAU’s measurement support.

Multithreading and Hybrid Parallelism


Goal: Apply TAU in multithreaded C++ and OpenMP programming environments and develop
enhancements for hybrid (“mixed-mode”) parallel execution based on MPI.
Tasks
1. APP-1: Demonstrate TAU's ability to profile and trace example application codes developed with the
Overture framework.
Status: Complete. TAU is integrated with the Overture and AMRSim frameworks (Brian Miller,
CASC, LLNL); see Miller’s PDPTA ’01 paper [3]. PDT is also being used in these projects. Our
work with the Overture and AMRSim frameworks is continuing.
2. APP-2: Port TAU to multithreaded OpenMP environments, targeting the KAI KAP/Pro OpenMP
compiler in particular, and interact with OpenMP application developers in its use.
Status: Complete. TAU supports OpenMP programming environments in two forms: OpenMP
runtime system routine instrumentation and OpenMP source transformation. The latter method is

Allen D. Malony Page 2 5/20/2018


implemented using the Opari OpenMP directive rewriting tool of Bernd Mohr, ZAM/FZJ, Germany.
This work was reported in our EWOMP ’01 [4] and LACSI ’01 [5] papers. TAU supports KAI’s
KAP/Pro, SGI, IBM, Compaq, and PGI OpenMP compiler suites.
3. APP-3: Specify OpenMP runtime system “hooks” that OpenMP compiler vendors might provide that
could be used effectively by TAU for performance measurement.
Status: Complete. In association with Bernd Mohr, we defined the POMP performance interface for
OpenMP; see LACSI ’01 paper. POMP and Opari were demonstrated with both the TAU and
EXPERT performance measurement and analysis systems. The POMP specification has been
presented to the OpenMP Future Committee and ARB. Current work is underway to merge the
OMPI performance interface defined by the INTONE project with POMP. We have also been
closely involved with KAI on the OpenMP performance interface specification [6] for the ASCI Path
Forward, Ultrascale Tools Initiative, RTS – Parallel Systems Performance project. An ASCI report
was jointly-authored with KAI on the OpenMP performance tool interface. KAI is working on an
implementation of the POMP interface.
4. APP-4: Enhance TAU for use in C++/MPI and OpenMP/MPI (OpenMPI) hybrid parallel execution
environments and demonstrate on selected applications.
Status: Complete. TAU now supports several hybrid execution, including C++/MPI , OpenMP/MPI,
and even Java/MPI. Multi-level instrumentation is applied (Sameer Shende, Ph.D. thesis [7]) using
PDT source instrumentation, MPI wrapper library instrumentation, and POMP/Opari for OpenMP
instrumentation. We have presented this work in our SC ’01 tutorial [8]. C++/MPI hybrid
performance measurement is also used in our work with the University of Utah ASCI/ASAP C-
SAFE project (Chris Johnson and Steve Parker). This work will be published in ISHPC ’02 [9].
5. APP-5: Support POOMA 2.4 development team in the use of TAU for performance instrumentation,
measurement, and analysis.
Status: Complete. We continued to support requests from the POOMA development team. In
particular, on Jeffrey Oldham’s (CodeSourcery, LLC) recommendation, TAU’s PDT instrumentor
was extended to support selective instrumentation capabilities. A –noinline option was added to
suppress instrumentation of inlined procedures. TAU and PDT are available for download from the
POOMA webpage.

Personnel

The work described will be performed by:

 Dr. Allen D. Malony : Associate Professor


 Sameer Shende : Post-Doctorate Research Associate
 Robert Ansell-Bell : Research Associate

References
1. S. Shende, A. Malony, and R. Ansell-Bell, "Instrumentation and Measurement Strategies for
Flexible and Portable Empirical Performance Evaluation," Proc. Int'l. Conf. on Parallel and
Distributed Processing Techniques and Applications (PDPTA 2001), June 2001.

Allen D. Malony Page 3 5/20/2018


2. C. Rasmussen, K. Lindlan, B. Mohr, J. Striegnitz, "CHASM: Static Analysis and Automatic
Code Generation for Improved Fortran 90 and C++ Interoperability," Proc. Los Alamos Computer
Science (LACSI) Symp. 2001, Oct. 2001.

3. B. Miller, B. Phillip, D. Quinlan, and A. Wissink, "AMRSim: An Object-oriented Performance


Simulator for Parallel Adaptive Mesh Refinement," Proc. Int'l. Conf. on Parallel and Distributed
Processing Techniques and Applications (PDPTA 2001), June 2001.

4. B. Mohr, A. Malony, S. Shende, and F. Wolf, "Towards a Performance Tool Interface for
OpenMP: An Approach Based on Directive Rewriting," Proc. Third European Workshop on
OpenMP (EWOMP 2001), Sept. 2001.

5. A. Malony, B. Mohr, S. Shende, and F. Wolf, "Design and Prototype of a Performance Tool
Interface for OpenMP," Proc. Los Alamos Computer Science (LACSI) Symp. 2001, Oct. 2001.

6. B. Kuhn, A. Malony, B. Mohr, and S. Shende, "A Performance Tool Interface for OpenMP,"
Report for Accelerated Strategic Computing Initiative (ASCI), ASCI Path Forward program,
Ultrascale Tools Initiative, RTS - Parallel System Performance, submitted by KAI Software, A
Division of Intel America, Inc., Aug. 2001.

7. S. Shende, "The Role of Instrumentation and Mapping in Performance Measurement," Ph.D.


Dissertation, University of Oregon, Aug. 2001.

8. A. Malony, B. Mohr, and S. Shende, "Performance Technology for Complex Parallel Systems,"
SC 2001 tutorial, Nov. 2001.

9. D. St. Germain, A. Morris, S. Parker, A. Malony, and S. Shende, "Integrating Performance


Analysis in the Uintah Software Development Cycle," Int'l. Symp. on High Performance Computing
(ISHPC-IV), May 2002.

Allen D. Malony Page 4 5/20/2018

Das könnte Ihnen auch gefallen