Industries Training & Education Partnership Developer Center Lines of Business University Alliances Events & Webinars Innovation Log On Join Us Hi, Guest Search the Community Activity Communications Actions Browse ABAP Testing and Troubleshooting Previous post Next post 0 Tweet 0 The efficient handling of internal tables is one of the most important performance factors in ABAP programs. Therefore it is essential to know the runtime behavior of internal table statements. This blog describes how to measure operations on internal tables and how to ensure that the measurement results are reliable. 1. Introduction In ABAP, there are three table types: standard table, sorted table and hashed table; and two main types of accesses, read with index and read with key (which has some subtypes). The expectations on table types and access regarding performance are the following: The fastest accesses should be independent of the table size. This behavior should be realized by the index reads on standard table and sorted table. The hashed table allows no index read, it calculates a hashed value from the table key which allows also a direct access to the searched line. A binary search algorithm splits in every step the search area in two parts and checks which part contains the wanted entry. It can be applied if the table has a sort order, i.e. either a sorted table or a sorted standard table. The binary search should have a logarithmic dependence on the table size. It is realized automatically by the read on a sorted table with table key. It can also be used on standard tables by adding BINARY SEARCH at the end of the read statement. Here you must take care that the standard table is sorted in ascending order according to the used key. If the sort order is not fulfilled, then the binary search still works, but it can miss entries. Please be also aware, that a sort is an expensive operation, it should never be used inside a large loop. In principle, a table should be sorted exactly once during the execution of a program. Both realizations of the binary search find not just any record fulfilling the search condition, but the first record according to the sort order. Therefore, they also speed up a read with key where the key is not the complete table key but only a leading part of it. All other reads must scan the whole table sequentially, and need an average runtime per record which is directly proportional to the size of the table. These are, all reads from standard table, if no binary search is added and no index is used, all reads from sorted tables which contain no leading fields of the table key and no index, and all reads on hashed tables which do not specify the complete table key. This blog has two goals: 1. First to demonstrate that the behavior is qualitatively as described above. 2. And to establish a reliable measurement method. In a previous blog ( Runtimes of Reads and Loops on Internal Tables) the exact measurement results for several reads from internal tables have been shown. 2. Measurement Program One might assume that in principle the measurement of a READ from an internal table is very simple. You simply use the ABAP command GET RUN TIME as GET RUN TIME FIELD start. READ TABLE sort1 WITH TABLE KEY key1 = k1 key2 = k2 INTO wa1. GET RUN TIME FIELD stop. It is absolutely essential that the measurement does not contain operations other than the one you want to measure. This is really obvious but still often overlooked. Additionally, you will soon recognize that the results of such simple measurements show huge variation and do not match the expected behavior. The following program has all the ingredients to measure internal table operations in a reliable way.
How to Measure Operations on Internal Tables Posted by Siegfried Boes in ABAP Testing and Troubleshooting on Nov 9, 2007 12:31:39 AM Share 0 Like *&---------------------------------------------------------------------* *& Report Z_ITAB_TEST *&--------------------------------------------------------- ------------* * Measures Runtimes of Read from Internal Tables * for hashed table with table key and * for sorted table with table key * * Several variation parameters are built in: * N_max : to cover different table sizes 10,20,50, ... 10.000 * Pre-Read: to exclude first executions from measurement * I_max : to increase time resolution * S_Max : to get better statistics * L_Max : to read lines from different location in the table * * Last change: Nov 2007 *----------------------------------------- -----------------------------* REPORT zsb_itab_test LINE-SIZE 220. TYPES: BEGIN OF st_tab, keyfield(30) TYPE n, c(2) TYPE n, ctext(168) TYPE c, END OF st_tab. TYPES: tab TYPE STANDARD TABLE OF st_tab WITH KEY keyfield c, sort TYPE SORTED TABLE OF st_tab WITH UNIQUE KEY keyfield c, hash TYPE HASHED TABLE OF st_tab WITH UNIQUE KEY keyfield c. DATA: n_itab TYPE tab, s_itab TYPE tab, c10(10) TYPE c, c50(50) TYPE c, c250(250) TYPE c, c1000(1000) TYPE c, c5000(5000) TYPE c, textx TYPE st_tab-ctext, j_max TYPE i VALUE '20', n_i TYPE i, nn TYPE i, ll TYPE i, l_inc TYPE i, l10 TYPE i, start TYPE i, stop TYPE i, t_i TYPE p DECIMALS 3, t1_s TYPE p DECIMALS 3, t2_s TYPE p DECIMALS 3, t3_s TYPE p DECIMALS 3, t1_s_min TYPE p DECIMALS 3, t2_s_min TYPE p DECIMALS 3, t3_s_min TYPE p DECIMALS 3, t1_l TYPE p DECIMALS 3, t2_l TYPE p DECIMALS 3, t3_l TYPE p DECIMALS 3, tsum1_l TYPE p DECIMALS 3, tsum2_l TYPE p DECIMALS 3, tsum3_l TYPE p DECIMALS 3, t1_n TYPE p DECIMALS 3, t2_n TYPE p DECIMALS 3, t3_n TYPE p DECIMALS 3. c10 = '1234567890'. CONCATENATE c10 c10 c10 c10 c10 INTO c50. CONCATENATE c50 c50 c50 c50 c50 INTO c250. CONCATENATE c250 c250 c250 c250 INTO c1000. CONCATENATE c1000 c1000 c1000 c1000 c1000 INTO c5000. textx = c5000. *-------------------------- -------------------------------------------- PARAMETERS: n_max TYPE i DEFAULT '10', preread TYPE c AS CHECKBOX, i_max TYPE i DEFAULT '1', s_max TYPE i DEFAULT '1', l_max TYPE i DEFAULT '1'. START-OF- SELECTION. FORMAT COLOR COL_KEY INTENSIFIED ON. WRITE: / ' '. WRITE AT 30 'Runtime (micro-seconds) '. WRITE: / ' '. WRITE AT 10 'N'. WRITE AT 30 'Read_1'. WRITE AT 50 'Read_2'. WRITE AT 70 'Offset '. FORMAT RESET. *-------------------------------------------------- --------------------- n_i = 0. * 4. variation: size of internal tables:----- ---------------------------- DO n_max TIMES. n_i = n_i + 1. * fill internal tables of certain size: PERFORM fill_itab USING n_i CHANGING nn. CLEAR tsum1_l. CLEAR tsum2_l. CLEAR tsum3_l. CLEAR ll. l_inc = nn / ( l_max + 1 ). * 3. variation: different locations:---------------------------------- WHILE ( ll<l_max ). ll = ll + 1. l10 = ( l_inc * ll ) * 10. t1_s_min = '9999999.9'. t2_s_min = '9999999.9'. t3_s_min = '9999999.9'. * 2. variation: the statistical repeats:------------------------------ DO s_max TIMES. * 1. variation: internal repeats--------------------------------- ----- IF ( i_max EQ '1' ). IF ( preread IS INITIAL ). PERFORM read_hashed_tabkey_1. t1_s = t_i. PERFORM read_sorted_tabkey_1. t2_s = t_i. ELSE. PERFORM read_hashed_tabkey_p. t1_s = t_i. PERFORM read_sorted_tabkey_p. t2_s = t_i. ENDIF. ELSE. PERFORM read_hashed_tabkey. t1_s = t_i. PERFORM read_sorted_tabkey. t2_s = t_i. PERFORM empty_do. t3_s = t_i. ENDIF. * end of 1. variation----------- ---------------------------------- IF ( t1_s<t1_s_min ). t1_s_min = t1_s. ENDIF. IF ( t2_l<t2_s_min ). t2_s_min = t2_s. ENDIF. IF ( t3_l<t3_s_min ). t3_s_min = t3_s. ENDIF. ENDDO. * end of 2. variation:------------ ------------------------------- t1_l = t1_s_min. t2_l = t2_s_min. t3_l = t3_s_min. tsum1_l = tsum1_l + t1_l. tsum2_l = tsum2_l + t2_l. tsum3_l = tsum3_l + t3_l. ENDWHILE. * end of 3. variation: location-average:----------------------------------- t1_n = tsum1_l / l_max. t2_n = tsum2_l / l_max. t3_n = tsum3_l / l_max. FORMAT COLOR COL_NEGATIVE. WRITE: / nn. WRITE AT 20 t1_n. WRITE AT 40 t2_n. WRITE AT 60 t3_n. FORMAT RESET. ENDDO. *---------- ------------------------------------------------------------- * fill_itab: fills standard table * ote the filled table should have no special order * *----------------------------------------------------------------------- FORM fill_itab USING n_i TYPE i CHANGING nn TYPE i. DATA: itab TYPE tab, wa1 TYPE st_tab, count TYPE i. * predefined IF ( n_i GE 10 ). nn = 10000. ELSEIF ( n_i = 9 ). nn = 5000. ELSEIF ( n_i = 8 ). nn = 2000. ELSEIF ( n_i = 7 ). nn = 1000. ELSEIF ( n_i = 6 ). nn = 500. ELSEIF ( n_i = 5 ). nn = 200. ELSEIF ( n_i = 4 ). nn = 100. ELSEIF ( n_i = 3 ). nn = 50. ELSEIF ( n_i = 2 ). nn = 20. ELSEIF ( n_i = 1 ). nn = 10. ENDIF. REFRESH itab. REFRESH n_itab. REFRESH s_itab. CLEAR wa1. CLEAR count. *------------------------------------------------ * itab is build sorted ! DO nn TIMES. count = count + 1. wa1-keyfield
The explanation of the parameters, their effect and their proper setting are discussed step-by-step below. 3. The Effect of the Parameters 3.a. Variation of the Size n_max The runtime of an operation on an internal table depends on the number of lines in the table, the machine power, the table width and other factors. So, if we measure only for one fixed table of the size n, for example n = 1000, then we do not learn much. We are mainly interested in the dependence of the runtime on the size of the internal table n, as all other parameters should not change. Therefore a variation of n is included in the test program. The effect can be seen by running the test program with n_max = 10, pre-read off, i_max = 1, s_max = 1, l_max = 1, i.e. in default setting. For simplicity, 10 values have been predefined to cover the range from 10 to 10.000. Execute the tests several times to check whether this setting leads already to reliable data or not. The results are shown as black lines in figures 1 and 2. It is obvious that there is a lot of variation between the measurements. Also, the dependence on the table size is far from what we expect. So before we draw wrong conclusions, let us check whether the measurement can be improved any further. 3.b. Pre-read Cost of Initial Reads The strangest effect in these first measurements is the rather strong increase of the runtime with the number lines in the table: for the hashed table we expect no increase, and for the sorted table maybe a small increase. It seems that a first read needs much more time than the subsequent ones, which are therefore a better measure for our needs. For this reason a pre-read was added, i.e. 20 reads on the table were executed before the actual measurement is done. The effect can be seen by running the test program with n_max = 10, pre-read on, i_max = 1, s_max = 1, l_max = 1. The results are shown as orange lines in figures 1 and 2. The result is much smaller than the result of the first test, but there is still a lot of variation between the measurements. 3.c. Measurement Time Resolution Repeated Execution i_max The measurements are now in the range of a few microseconds and therefore extremely close to the time resolution of the GET RUN TIME, which is one micro-second. So measuring one execution is not reliable, the operation must be repeated several times to get runtimes in the ranges of 50 or more microseconds. This can be done by adding a DO ... ENDDO. The cost of the empty DO ENDDO must be deduced from the measurement. GET RUN TIME FIELD start. DO i_max TIMES. READ TABLE sort1 WITH TABLE KEY key1 = k1 key2 = k2 INTO wa1.
ENDDO. GET RUN TIME FIELD stop. This can be done by running the test program with n_max = 10, pre-read on, i_max = 1000, s_max = 1, l_max = 1. Note, i_max was increased until the results did change no longer. The results are shown the figures 1 and 2 as green lines. The result is much smaller than the previous results therefore the detail view was added. There is still a bit of variation in the results. 3.d. Repeats for Better Statistics s_max To reduce the variation of the results even further, it helps to the repeat the measurements several times and calculate the average. It can be assumed that the variations are caused by some uncontrollable effects, which should have only a negative impact. i.e. they can make the execution slower but not faster. Therefore we do not average over the different executions but use the fastest execution out of several measurements. This can be done by running the test program with s_max than 1, i.e. with n_max = 10, pre-read on, i_max = 1000, s_max = 20, l_max = 1. The results are shown in the figures 1 and 2 as blue lines. The variation decreases again. 3.e. Location Dependence l_max It is obvious that an operation like the sequential read, which scans the table from start to end, will find an entry at the beginning faster than one at the end. In this case it is also obvious that the runtime for an entry in the middle of the table is equal to the averaged runtime. However, in the case of a read facilitating a binary search, it is not clear which line would represent the averaged runtime. In general it is much better to execute the average over the runtimes of several reads accessing different parts of the table. This can be done by running the test program with l_max larger than 1, i.e. with n_max = 10, pre-read on, i_max = 1000, s_max = 20 and l_max = 20. The program accesses l_max different lines equidistantly distributed over the whole table. The results are shown in the figures 1 and 2 as red lines. These are the best results and do resemble our expectations quite well. 4. Results Figure 1 and Detail: Averaged runtime (in micro-sec) for the hashed read with table key for different table sizes N according to the method explained above. The different colors display different setting of the parameter values (n_max, pre-read, i_max, s_max, l_max). The blacks lines have only the table size n variation (10, off, 1, 1, 1), the orange lines add a pre-read before the measurements (10, on, 1, 1, 1), the green lines an internal i variation because of the restricted time resolution (10, on, 1000, 1, 1), the blue lines also the statistical (s) variation (10, on, 1000, 20, 1), and the red lines also a location variation (10, on, 1000, 20, 20).
Figure 2 and Detail: Averaged runtime (in micro-sec) of a sorted read with table key for different table sizes N according the method explained above. The different colors display different settings of the parameter values (n_max, pre-read, i_max, s_max, l_max). The blacks lines have only the table size n variation (10, off, 1, 1, 1), the orange lines add a pre-read before the measurements (10, on, 1, 1, 1), the green lines an internal i variation because of the restricted time resolution (10, on, 1000, 1, 1), the blue lines also the statistical (s) variation (10, on, 1000, 20, 1), and the red lines also a location variation (10, on, 1000, 20, 20).
Measuring operations on internal tables in principle seems very simple. However, to get really reliable data bit more effort must be put into the measurement. How it should be done was discussed here. Further Reading: Performance-Optimierung von ABAP-Programmen (in German!) More information on performance topics can be found in my new textbook on performance (published Nov 2009). However please note, that it is right now only available in German. Chapter Overview: 1. Introduction 2. Performance Tools 3. Database Know-How 4. Optimal Database Programming 5. Buffers 6. ABAP - Internal Tables 7. Analysis and Optimization 8. Programs and Processes 9. Further Topics 10. Appendix In the book you will find detailed descriptions of all relevant performance tools. An introduction to database processing, indexes, optimizers etc. is also given. Many database statements are discussed and different alternatives are compared. The resulting recommendations are supported by ABAP test programs which you can download from the publishers webpage (see below). The importance of the buffers in the SAP system are discussed in chaptr five. Of all ABAP statements mainly the usage of internal tables is important for a good performance. With all the presented knowledge you will able to analyse your programs and optimize them. The performance implications of further topics, such as modularisation, workprocesses, remote function calls (RFCs), locks & enqueues, update tasks and prallelization are explained in the eight chapter. Even more information - including the test programs - can be found on the webpage of the publisher. I would recommend you especially the examples for the different database statements. The file with the test program (K4a) and necessary overview with the input numbers (K4b) can even be used, if you do not speak German! Follow SCN Site Index Contact Us SAP Help Portal Privacy Terms of Use Legal Disclosure Copyright Average User Rating (1 rating) 0 Tweet 0 2668 Views Topics: abap Share 0 Like 3 Comments Like (0) Rui Pedro Dantas Nov 9, 2007 3:28 AM Very good posts, both this and the previous one you linked to. I have one doubt, that maybe was answered somewhere else by you already: why did you prefer to use GET RUN TIME in the code, instead of measuring with SE30? Was it not reliable? Like (0) Siegf ried Boes Nov 9, 2007 4:31 AM (in response to Rui Pedro Dantas) Thank you for your interest, I took the GET RUNTIME for 2 reasons: + the SE30 has more overhead than the GET RUNTIME + and for usability, how should I extract the results of the different measurements from the SE30 and display it in the result list? Siegfried Like (0) Sreenivas Mamidi Nov 11, 2009 3:21 PM Hi, Your blogs are extremely useful, Thank you very much for the clear explanations. Cheers, Sreenivas.