Beruflich Dokumente
Kultur Dokumente
Performance Guidelines for IBM InfoSphere DataStage Jobs Containing Sort Operations on Intel Xeon Servers
Revision: 1.0 Intended for public distribution Date: January 31, 2011 Authors: Garrett Drysdale, Intel Corporation Jantz Tran, Intel Corporation Sriram Padmanabhan, IBM Brian Caufield, IBM Fan Ding, IBM Ron Liu, IBM Pin Lp Lv, IBM Mi Wan Shum, IBM Jackson Dong Jie Wei, IBM Samuel Wong, IBM
Table of Contents
Performance Guidelines for IBM InfoSphere DataStage Jobs ............................................................................ 1 Containing Sort Operations on Intel Xeon Servers......................................................................................... 1 1. Introduction ........................................................................................................................................... 3 2. Overview of IBM InfoSphere DataStage .............................................................................................. 3 3. Overview of Intel Xeon Series X7500 Processors ................................................................................ 4 4. Sort Operation in IBM InfoSphere DataStage .......................................................................................... 5 Testing Configurations ......................................................................................................................... 6 5. Summary for Sort Performance Optimizations ........................................................................................ 8 6. Recommendations for Optimizing Sort Performance ............................................................................... 9 6.1 Optimal RMU Tuning ...................................................................................................................... 9 Configuration / Job Tuning Recommendations ....................................................................................... 10 6.2 Final Merge Sort Phase Tuning using Linux Read Ahead ............................................................... 13 Configuration / Job Tuning Recommendations ....................................................................................... 13 6.3 Using a Buffer Operator to minimize latency for Sort Input .......................................................... 14 Configuration / Job Tuning Recommendations ....................................................................................... 15 6.4 Minimizing I/O for Sort Data containing Variable length fields ....................................................... 15 Configuration / Job Tuning Recommendations ....................................................................................... 16 6.5 Future Study: Using Memory for RAM Based Scratch Disk ............................................................ 16 7. Conclusion............................................................................................................................................ 17 8. About the Authors ............................................................................................................................... 19 9. Legal Disclaimer (Intel) ......................................................................................................................... 21 10. Legal Disclaimer (IBM) .......................................................................................................................... 22
1. Introduction
This whitepaper is the first in what is anticipated to be a series of whitepapers envisioned to provide IBM InfoSphere DataStage customers with helpful performance tuning guidelines for deployment on Intel Xeon processor- based platforms. IBM and Intel began collaborating to optimize performance and ROI of the combination of IBM InfoSphere DataStage and Intel Xeon based platforms in 2007. Our goal is to not only optimize the performance, and therefore, reduce the total cost of ownership of this powerful combination in future versions of IBM InfoSphere DataStage on future Intel processors, but also to pass along tuning and configuration guidance that we discover along the way. In our work together, we are striving to understand the execution characteristics of DataStage jobs on Intel platforms. This information is used to determine the hardware configurations, the operating system settings, and the job design and tuning techniques to optimize performance. Because of the highly scalable capabilities of IBM InfoSphere DataStage, our tests are focused on the latest Intel Xeon 4 and 8 socket capable X7560 Xeon EX processors. Initially, we are testing with four socket configurations. We have presented information about IBM InfoSphere DataStage on Intel platforms at the 2009 and 2010 IBM Information on Demand Conferences. In 2009, our audience applauded the great scalability of IBM InfoSphere DataStage on Intel platforms, but asked us to provide more information on the I/O requirements of jobs and how to get the most out of existing platform I/O capability. Since then, we have found ways to increase the overall performance of all jobs in the new Information Server 8.5 version of IBM InfoSphere DataStage that is now a 64 bit binary on Intel platforms, and we investigated the I/O requirements of sorting. The focus of the paper is on the key pieces of information we obtained regarding configuring the platform, Operating System, and DataStage jobs that contain sort operators. Sort is a crucial operation in data integration software, Sort operations are I/O intensive operations and can cause significant I/O load on the temporary or scratch file system. To optimize server CPU utilization, the scratch I/O storage system must be capable of providing the necessary disk bandwidth demanded by the sort operations. A scratch storage system that cannot write or read data at a high enough bandwidth will lead to under-utilization of computing capability of the system. This will be observed as low CPU utilization. The paper provides recommendations that will reduce the bandwidth demand placed on the scratch storage I/O system by sort operations. These I/O reductions result in improved performance that can be quite significant for systems where the scratch I/O storage system is significantly under sized in comparison to the compute capability of the processors. We show such a scenario in this paper. Ideally, the best solution is to upgrade the scratch I/O storage subsystem to match the compute capability of the server.
General processing stages such as filter, sort, join, union, lookup and aggregations Built-in and custom transformations Copy, move, FTP and other data movement stages Real-time, XML, SOA and Message queue processing Additionally, IBM InfoSphere DataStage allows pre- and post-conditions to be applied to all these stages. Multiple jobs can be controlled and linked by a sequencer. The sequencer provides the control logic that can be used to process the appropriate data integration jobs. IBM InfoSphere DataStage also supports a rich administration capability for deploying, scheduling and monitoring jobs. One of the great strengths of IBM InfoSphere DataStage is that when designing jobs, very little consideration to the underlying structure of the system is required and does not typically need to change. If the system changes, is upgraded or improved, or if a job is developed on one platform and implemented on another, the job design does not necessarily have to change. IBM InfoSphere DataStage has the capability to learn about the shape and size of the system from the IBM InfoSphere DataStage configuration file. Further, it has the capability to organize the resources needed for a job according to what is defined in the configuration file. When a system changes, the file is changed, not the jobs. A configuration file defines one or more processing nodes with which the job will run. The processing nodes are logical rather than physical. The number of processing nodes does not necessarily correspond to the number of cores in the system. The following are factors that affect the optimal degree of parallelism: CPU-intensive applications, which typically perform multiple CPU-demanding operations on each record, benefit from the greatest possible parallelism up to the capacity supported by a given system. Jobs with large memory requirements can benefit from parallelism if they act on data that has been partitioned and if the required memory is also divided among partitions. Applications that are disk- or I/O-intensive, such as those that extract data from and load data into databases, benefit from configurations in which the number of logical nodes equals the number of I/O paths being accessed. For example, if a table is partitioned 16 ways inside a database or if a data set is spread across 16 disk drives, one should set up a node pool consisting of 16 processing nodes. Another great strength of IBM InfoSphere DataStage is that it does not rely on the functions and processes of a database to perform transformations: while IBM InfoSphere DataStage can generate complex SQL and leverages databases, IBM InfoSphere DataStage is designed from the ground up as a multipath data integration engine equally at home with files, streams, databases, and internal caching in single-machine, cluster, and grid implementations. As a result, customers in many circumstances find they do not also need to invest in staging databases to support IBM InfoSphere DataStage.
With intelligent performance, a new high-bandwidth interconnect architecture, and greater memory capacity, platforms based on the Intel Xeon series 7500 processor are ideal for demanding workloads. A standard four-socket server provides up to 32 processor cores, 64 execution threads and a full terabyte of memory. Eight-socket and larger systems are in development by leading system vendors. The Intel Xeon series 7500 processor also includes more than 20 new reliability, availability and serviceability (RAS) features that improve data integrity and uptime. One of the most important is Intel Machine Check Architecture Recovery, which allows the operating system to take corrective action and continue running when uncorrected errors are detected. These highly scalable servers can be used to support enormous user populations. Server platforms based on the Intel Xeon series 7500 processor deliver a number of additional features that help to improve performance, scalability and energy-efficiency. Next-generation Intel Virtualization Technology (Intel VT) provides extensive hardware assists in processors, chipsets and I/O devices to enable fast application performance in virtual machines, including near-native I/O performance. Intel VT also supports live virtual machine migration among current and future Intel Xeon processor-based servers, so businesses maintain a common pool of virtualized resources as they add new servers. Intel QuickPath Interconnect Technology provides point-to-point links to distributed shared memory. The Intel Xeon 7500 series processors with QPI feature two integrated memory controllers with and 3 QPI links to deliver scalable interconnect bandwidth, outstanding memory performance and flexibility and tightly integrated interconnect RAS features. Technical articles on QPI can be found at http://www.intel.com/technology/quickpath/. Intel Turbo Boost Technology boosts performance when its needed most by dynamically increasing core frequencies beyond rated values for peak workloads. Intel Intelligent Power Technology adjusts core frequencies to conserve power when demand is lower. Intel Hyper-Threading Technology can improve throughput and reduce latency for multithreaded applications and for multiple workloads running concurrently in virtualized environments. For additional information on the Intel Xeon Series 7500 Processor for mission critical applications, please see http://www.intel.com/pressroom/archive/releases/20100330comp_sm.htm.
Figure 1 - Sort operation overview The sort buffer is used during both the initial sort phase and the final merge phase of the sort operation. During the final merge phase, a block of data is read from the beginning of each of the temporary sorted files stored on the scratch file system. If the sort buffer is too small, there will not be enough memory to read a chunk of data from each of the temporary sort files from the initial sort phase. This condition will be detected during the initial sort phase and if it occurs, a second thread will run to perform pre-merging of the temporary sort files. This will reduce the number of temporary sort files so that the buffer will have sufficient space to load a block of data from each of the temporary sort files during final merging. In the following tests, we will show several tuning and configuration settings that can be used to reduce the I/O demand placed on the system by sort operations.
Testing Configurations
The testing was done on a single Intel server with the Intel Xeon 7500 series chipset and four Intel Xeon X7560 processors. The X7560 processors are based on the Nehalem micro architecture. The system has 4 sockets, 8 cores per socket, and 2 threads per core using Intel Hyper-Threading Technology for a total of 64 threads of execution. Our test configuration uses 64 GB of memory though the platform has a maximum capacity of 1 TB. The processor operating frequency is 2.26 GHz and each processor has 24 MB of L3 cache shared across the 8 cores. The system uses 5 Intel X-25E solid state drives (SSDs) for temporary I/O storage configured in a RAID-0 array using the on board RAID controller. This storage is used as scratch storage for the sort tests. The bandwidth capability of the 5 SSDs was not sufficient to maximize the CPU utilization of the system given the high performance capabilities of DataStage and this will be explained in more detail later. We recommend sizing the I/O subsystem to maximize CPU utilization although we were not able to do this given the equipment available at the time of data collection. The operating system is Red Hat* Enterprise Linux* 5.3, 64 bit version. The test environment is a standard Information Server two tier configuration. The client tier is used to run just the DataStage client applications. All the remaining Information Server tiers are installed on a single Intel Xeon X7560 server.
Test Client(s)
Client
Information Server (IS) Tiers (Services + Repository + Engine)
Window server 2003 Processor Type: x86 -based PC Processor Speed: 2.4GHZ Memory Size: 8 GB RAM
Platform: Red Hat EL 5.3, 64 bit Processor: Intel Xeon X7560, 4 socket, 32 cores, 64 threads Processor Speed: 2.26 GHz Memory Size: 64 GB RAM Metadata Repository: DB2/LUW 9.7 GA 5 Intel X25-E SSDs for Scratch Space configured as RAID0 array using onboard controller.
IS Topology: Standalone
Figure 2 - System Test Configuration The following table lists the specifics of the platform tested: OEM CPU Model ID Platform Name Sockets Cores per Socket Threads per core CPU Code Name CPU Frequency (GHz) QPI GT/s Hyperthreading Prefetch Settings LLC Size (MB) BIOS Version Memory Installed (GB) DIMM Type DIMM Size (GB) Number of DIMMS NUMA OS Table 1 Intel Platform Tested Intel 7560 Boxboro 4 8 2 Nehalem-EX 2.24 6.4 Enabled Default 24 R21 64 DDR3-1066 4 16 Enabled RHEL 5.3 64 bit
In this configuration file, each DataStage processing node has its own scratch space defined in a directory that resides on separate physical devices. This helps prevent contention for I/O subsystem resource among DataStage processing nodes. This is a fairly well known technique and was not studied for this paper. This paper describes additional techniques to achieve the optimal performance for DataStage jobs containing Sort operations: 1) Setting the Restrict Memory Usage (RMU) parameter for sort operations to an appropriate value for large data sets will reduce I/O demand on the scratch file system. The recommend RMU size varies with the data set size and node count. The formula is shown in section 6.1 along with a reference table that summarizes the suggested RMU sizes for a variety of data set sizes and node counts. The RMU parameter provides users with the flexibility of defining the sort buffer size to optimize memory usage of their system. 2) Increasing the default Linux read-ahead value for the disk storage system(s) for scratch space can increase the performance of the final merge phase of sort operations. The recommended setting for the read ahead value is 512 or 1024 sectors (256KB or 512KB) for the scratch file system. See section 6.2 for information on how to change the read ahead value in Linux. 3) Sort operations can benefit from having a buffer operator inserted prior to the sort in the data flow graph. Because sort operations work on large amounts of data, a buffer operator provides extra storage to get the data to the sort operator as fast as possible. See section 6.3 for details. 4) Enabling the APT_OLD_BOUNDED_LENGTH setting can decrease I/O demand during sorting when bounded length VARCHAR data is involved, potentially resulting in improved overall throughput for the job.
6.1
This section describes how to tune the sort buffer size parameter called RMU to minimize I/O demand on the scratch I/O system. An RMU value that is too small will result in intermediate merges of temporary files during the initial sort phase. These intermediate merges can significantly increase the I/O demand on the scratch file
system. Tuning the RMU value appropriately can eliminate all intermediate merge operations and greatly increase throughput of sort operations for systems with limited I/O bandwidth to the scratch I/O file system. The scratch disk I/O system on many systems is a bottleneck to performance due to insufficient bandwidth capability of the number of disks or the interconnect bandwidth is less than needed to maximize CPU utilization. The elimination of pre-merging can reduce the overall I/O demand on the scratch file system therefore allowing the scratch file system to complete I/O faster, increasing throughput and decreasing job run time.
10
1Node 4Nodes 8Nodes 16Nodes 24Nodes 32Nodes 48Nodes 64Nodes DataSize to be Sorted MinRMU MinRMU MinRMU MinRMU MinRMU MinRMU MinRMU MinRMU (GB) (MB) (MB) (MB) (MB) (MB) (MB) (MB) (MB) 1 Default Default Default Default Default Default Default Default 1.5 Default Default Default Default Default Default Default Default 3 28 Default Default Default Default Default Default Default 10 51 25 Default Default Default Default Default Default 30 88 44 31 22 Default Default Default Default 100 160 80 57 40 33 28 23 20 300 277 139 98 69 57 49 40 35 1000 506 253 179 126 103 89 73 63 3000 876 438 310 219 179 155 126 110 10000 1600 800 566 400 327 283 231 200
Table 2 RMU Buffer Size Table
Our test results of a job consisting of one sort stage running with 4 parallel nodes with two different RMU values are shown in Figure 3. The correct sizing of the RMU value resulted in a 36% throughput increase. In the tests, the I/O bandwidth did not decrease because the I/O subsystem was delivering the maximum bandwidth it was capable of in both cases. However, because the total quantity of data transferred was much lower, the CPU cores were able to operate at higher CPU utilization and complete the sort in a shorter amount of time. This optimization is very effective for scratch disks that are unable to deliver enough scratch file I/O bandwidth to feed the high performing Intel Xeon Server and highly efficient IBM InfoSphere DataStage Software. The results shown here are for a sort only job where we have isolated the effect of the RMU parameter. This optimization will help more complex jobs, but will only directly affect the performance of the sort operators within the job.
RMU Size 10 MB 30 MB
Figure 1 - Performance Tuning Sort with Sort Operator RMU value To modify the RMU setting for a Sort Stage in a job, on DataStage Designer client canvas, open the Sort stage, click on Tab Stage, then Properties, click on Options in the left window, and select Restrict Memory Usage (MB) from the Available properties to add window to add it.
11
Figure 4 Adding RMU Option Once the Restrict Memory Usage option is added, its value can be set to the recommended one based on above-mentioned formula.
12
6.2
During testing of the single node sort job, we found that CPU utilization of final merge can be improved by changing the scratch disk read ahead setting in Linux, resulting in substantial throughput improvements of the final merge sort phase.
13
RMU Size 30 MB 30 MB
Figure 6 - Performance Tuning Sort Operator with Linux Read Ahead Setting The current read ahead setting in Linux can be obtained using the following command: >hdparm To set the read ahead setting for a specific disk device in Linux, use the following command: >hdparm a 1024 /dev/sdb1 (sets read ahead to 1024 sectors on disk device, /dev/sdb1)
To make the command persist across reboots, edit the /etc/init.d/boot.local file. Recommended settings to try are 512 sectors (256 kB) or 1024 sectors (512 kB). Increasing read ahead size results in more data being read from the disk and stored in the OS disk cache memory. As a result, more read requests by the sort operator get the requested data directly from the OS disk cache instead of waiting for the full latency of a data read from the scratch storage system. (Note that the Linux file system cache is controlled by the kernel and uses memory that is not allocated to processes.) In our tests, the scratch storage system consists of SSDs configured in a RAID-0 array. I/O request latencies are low on this system compared to typical rotating media storage arrays. Increasing OS read ahead will benefit scratch storage arrays consisting of HDDs even more. Larger read ahead values than those tested may be more beneficial for HDD arrays. We chose to use SSDs because they provide higher bandwidth, much improved IOPS (I/Os per second) and much lower latency than an equivalent number of hard disk drives. Many RAID controllers found in commercial storage systems also have capability to do read ahead on read requests and store data in the cache. It is good to enable this feature if it is available on the storage array being used for scratch storage. It is still important to increase read ahead in the OS. Serving requests from the OS disk cache will be faster than having to wait for data from the RAID engine. The results shown here are on a job with a sort operation only. Tuning of read ahead will not impact performance of other operations in the job that are not performing scratch disk I/O.
6.3
The DataStage parallel engine employs buffering automatically as part of its normal operations. Because the initial sort phase has such a high demand for input data, it is especially sensitive to latency spikes in the data source feeding the sort. These latency spikes can occur due to data being sourced from local or remote disks, or due to scheduling of operators by the operating system. By adding an additional buffer in front of the sort,
14
we were able to maintain the CPU utilization on the core running the sort thread at 100% during the entire initial sort phase, thus increasing the performance of the initial sort phase by nearly 7%.
6.4
By default, the parallel engine internally handles bounded length VARCHAR fields (those that specify a maximum length) as essentially fixed length character strings. If the actual data in the field is less than the maximum length, the string is padded to the maximum size. This behavior is efficient for CPU processing of
15
records throughout the course of an entire job flow but it increases the I/O demands for operations such as Sort. When environment variable APT_OLD_BOUNDED_LENGTH is set, the data within each VARCHAR field is processed without additional padding resulting in a decreased amount of data written to disk. This can decrease I/O bandwidth demand and therefore increase performance when running a scratch disk subsystem with insufficient bandwidth. This can increase job throughput if the scratch file system is not able to keep up with the processing capability of DataStage and the Intel Xeon Server. Additional CPU cycles will be used to process variable length data when using APT_OLD_BOUNDED_LENGTH. More CPU processing power is used to reduce the amount of I/O required from the scratch file system by using this setting. Our test results of a job consisting of one sort stage running with 16 parallel nodes using the APT_OLD_BOUNDED_LENGTH resulted in a 25% reduction in size of temporary sort files and a 26% increase in throughput (a 21% reduction in runtime.) With APT_OLD_BOUNDED_LENGTH Normalized Comparison Default Scratch Storage Space Consumed 1.0 0.75x (75% of the original storage space used) Runtime 1.0 0.79x (79% of the original runtime) Throughput 1.0 1.26x (26% increase in job processing rate) Table 3 Sort Operation performance comparison using APT_OLD_BOUNDED_LENGTH Please note that the performance benefit of this tuning parameter will vary based on several factors. It only applies to data records that have varchar fields. The actual file size reduction realized on the scratch storage system will depend heavily on the maximum size specified by the varchar fields, and the size of the actual data contained in these fields, and whether the varchar fields are a sort key for the records. The amount of performance benefit will depend on how much the total file size is reduced, along with the data request rate of the sort operations compared to the capability of the scratch file system to supply the data. In our test configuration, the 16 node test resulted in the scratch I/O system being driven to its maximum bandwidth limit. By setting APT_OLD_BOUNDED_LENGTH, the amount of data that was written and subsequently read from the disk decreased substantially over the length of the job allowing faster completion.
6.5
As a future study, we intend to investigate performance when using a RAM based disk for scratch storage. . The memory bandwidth available in the Nehalem-EX test system is greater than 70 GB/s when correctly configured. While SSDs offer some bandwidth improvements over hard disk drives, they cannot begin to
16
match the performance of main memory bandwidth. The system supports PCI Express lanes to reach ~ 35 GB/s of I/O in each direction if all PCIe lanes are utilized. However, such an I/O solution would be expensive. The currently available 4 socket Intel X7560 systems can address 1 TB of memory and 8 socket systems can address 2 TB of memory. DRAM capacity will continue to rise with new product releases and IBM X series systems also offer options to increase DRAM capacity beyond the baseline. While DRAM is expensive when compared to disk drives on a per capacity basis, it is more favorable when comparing bandwidth capability in and out of the system. We plan to evaluate the performance and cost benefit analysis of large in-memory storage compared to disk drive based storage solutions and provides the results in the near future.
BestPractices
This paper describes additional techniques to achieve the optimal performance for DataStage jobs containing Sort operations:
Setting the Restrict Memory Usage (RMU) parameter for sort operations to an appropriate value for large data sets will reduce I/O demand on the scratch file system. The recommend RMU size varies with the data set size and node count. The formula is shown in section 6.1 along with a reference table that summarizes the suggested RMU sizes for a variety of data set sizes and node counts. The RMU parameter provides users with the flexibility of defining the sort buffer size to optimize memory usage of their system. Increasing the default Linux read-ahead value for the disk storage system(s) for scratch space can increase the performance of the final merge phase of sort operations. The recommended setting for the read ahead value is 512 or 1024 sectors (256KB or 512KB) for the scratch file system. See section 6.2 for information on how to change the read ahead value in Linux. Sort operations can benefit from having a buffer operator inserted prior to the sort in the data flow graph. Because sort operations work on large amounts of data, a buffer operator provides extra storage to get the data to the sort operator as fast as possible. See section 6.3 for details. Enabling the APT_OLD_BOUNDED_LENGTH setting can decrease I/O demand during sorting when bounded length VARCHAR data is involved, potentially resulting in improved overall throughput for the job.
7. Conclusion
We have shown how to optimize IBM InfoSphere DataStage sort performance on Intel Xeon processors using a variety of tuning options such as Sort buffer RMU size, Linux read ahead settings, additional Buffer operator, and configuring the Varchar length parameter.
17
Our results reinforce the necessity of correctly sizing I/O to optimize server performance. For sort, it is imperative to have sufficient scratch I/O storage performance to allow maximization of all sort operators running in the system concurrently in order to fully utilize the server. Powerful mission critical servers like the Intel Xeon Platforms based on the X7500 series processor running the IBM InfoSphere DataStage parallel engine can efficiently process data at extremely high data rates. As a result, I/O and network bandwidth are extremely important for high performance. Network interconnects like 10 Gbit/s Ethernet or 40 Gbit/s Fiber Channel are necessary to fully realize the computation potential of this powerful combination of hardware and software. In the near future, we plan to analyze the cost and benefit trade off of using large DRAM capacity as a replacement for disk subsystems for scratch I/O. We also will be looking at tuning high bandwidth networking solutions to optimize performance.
18
Dr. Sriram Padmanabhan is an IBM Distinguished Engineer, and Chief Architect for IBM InfoSphere Servers. Most recently, he had led the Information Management Advanced Technologies team investigating new technical areas such as the impact of Web 2.0 information access and delivery. He was a Research Staff Member and then a manager of the Database Technology group at IBM T.J. Watson Research Center for several years. He was a key technologist for DB2s shared-nothing parallel database feature and one of the originators of DB2s multi-dimensional clustering feature. He was also a chief architect for Data Warehouse Edition which provides integrated warehousing and business intelligence capabilities enhancing DB2. Dr. Padmanabhan has authored more than 25 publications including a book chapter on DB2 in a popular database text book, several journal articles, and many papers in leading database conferences. His email is srp@us.ibm.com. Brian Caufield is a Software Architect for Infosphere* Information Server responsible for the definition and design of new IBM InfoSphere DataStage features, and also works with the Information Server Performance Team. Brian represents IBM at the TPC, working to define an industry standard benchmark for data integration. Previously, Brian worked for 10 years as a developer on IBM InfoSphere DataStage specializing in the parallel engine. His email is bcaufiel@us.ibm.com. Fan Ding is currently a member of the Information Server Performance Team. Prior to joining the team, he worked in Information Integration Federation Server Development. Fan has a PH.D. in Mechanical Engineering and a Master in Computer Science from University of Wisconsin. His email is: fding@us.ibm.com.
Ron Liu is currently a member of the IBM InfoSphere Information Server Performance Team with focus on performance tuning and information integration benchmark development. Prior to his current job, Ron had 7 years in Database Server development (federation runtime, wrapper, query gateway, process model, and database security). Ron has a Master of Science in Computer Science and Bachelor of Science in Physics. His email is ronliu@us.ibm.com.
19
Pin Lp Lv is a Software Performance Engineer from IBM. Pin has worked for IBM since 2006. He worked as a software tester for IBM WebSphere Product Center Team and RFID Team from September 2006 to March 2009, and joined IBM InfoSphere Information Server Performance Team in April 2009. Pin has a Master of Science degree in Computer Science from University of West Scotland. His email is pinlv@cn.ibm.com Mi Wan Shum is the manager of the IBM InfoSphere Information Server performance team at the IBM Silicon Valley Lab. She graduated from University of Texas at Austin and she has years of software development experience in IBM. Her email is msshum@us.ibm.com Jackson (Dong Jie) Wei is a Staff Software Performance Engineer for IBM. He once worked as a DBA in CSRC before joining IBM in 2006. Since then, he has been working on the Information Server product. In 2009, he began to focus his work on the ETL performance. Jackson is also the technical lead for the IBM China Lab Information Server performance group. He got his bachelor and master degrees for Electronic Engineering of Peking University in 2000 and 2003 respectively. His email is weidongj@cn.ibm.com. Samuel Wong is a member of the IBM InfoSphere InfoSphere Information Server performance team at the IBM Silicon Valley Lab. He graduated from University of Toronto and he has 12 years of software development experience with IBM. His email is samwong@us.ibm.com
20
21
22