Managing Large Data Sets in LabVIEW

8/31/2009
Managing Large Data Sets in LabVIEW
Improve your ni.com experience. Login or Create a user profile.
Document Type: Tutorial NI Supported: Yes Publish Date: Aug 21, 2009

Overview One of the great strengths of LabVIEW is automatic memory management. This memory management allows the user to easily create strings, arrays, and clusters with none of the worries C/C++ users constantly have. However, this memory management is designed to be absolutely safe, so data is copied quite frequently. This normally causes no problems, but when the data size in a wire starts creeping into the megabyte range, copies start causing memory headaches, culminating in an out of memory error. While LabVIEW is not optimized for large data wires, it can be used with large data sets, provided the programmer knows a few tricks and is prepared for large block diagrams. This paper explains a few of these tricks. In particular, it explains the following: How to determine when LabVIEW creates a data copy How to minimize the effects of copies How to reduce the data used to plot graphs How to store large data sets in RAM without making copies How to read and write files over 2 GBytes in size using pre-8.0 versions of LabVIEW How to interact with 64-bit DLLs using pre-8.0 versions of LabVIEW Table of Contents 1. 2. 3. 4. 5. 6. 7. Reduce Data Copies Minimize Memory Problems with Chunking Fast Data Display with Decimation Methods for Large Data Storage Breaking the 2 GB File Size Barrier (LabVIEW 7.1 and earlier) Interfacing with 64-Bit DLLs (LabVIEW 7.1 and earlier) Sample Code
Reduce Data Copies Since LabVIEW is a dataflow language, copies are an integral part of how the language works. Any time there is a fork in a wire, a copy may be made. LabVIEW is fairly intelligent and usually makes a copy only when necessary. However, LabVIEW is also safe. If in doubt, a copy will be made. The programmer needs a way to determine if a copy of a data buffer is being made, especially if that data buffer contains a huge amount of data. The easiest way to do this is the buffer viewer. This is a native function of LabVIEW 7.1 and higher and is available at ni.com for LabVIEW 7.0 users. If you are a 7.0 user, use the "Show Buffer" utility available from KnowledgeBase 2XQEOODT, "Determining When and Where LabVIEW Creates a New Buffer." Download the ZIP file and unzip into your directory of choice. Close LabVIEW if it is open. Copy ShowBuffers.llb and the ShowBuffersHelp directory into your <LabVIEW7.0>\project directory. To use the buffer viewer, open LabVIEW and the VI you wish to check for buffer copies. Open the block diagram of the VI you want to check for buffer copies. In LabVIEW 8.x, select ToolsProfileShow Buffer Allocations from the LabVIEW menu. In LabVIEW 7.1, select ToolsAdvancedShow Buffer Allocations. In LabVIEW 7.0, select ToolsShow Buffers. When the dialog pops up, check or uncheck the types of buffer copies you want to see. Look on the block diagram of your opened VI. The black dots show you buffer copies/creations. These black dots will appear only if the VI is unbroken and compiled. Refer to the buffer viewer documentation for more information. To find the buffer viewer documentation, open the LabVIEW documentation by pressing Ctrl-H while in a LabVIEW development session, then search for Show Buffer. Use the VI profiler to find which VIs produce memory problems. In LabVIEW 8.x, start the VI profiler from ToolsProfilePerformance and Memory. In previous versions of LabVIEW, use ToolsAdvancedProfile VIs.
zone.ni.com/devzone/cda/tut/p/id/3625
1/7
8/31/2009
Select Profile Memory Usage, and then select Memory Usage to turn on memory profiling. Refer to the LabVIEW documentation for details of how to use the profiler. Open the LabVIEW documentation (Ctrl-H) and search for Profile VIs. As a final check, combine execution highlighting with your OS's memory monitor as an excellent sanity check of your memory usage. This method also picks up memory copies that the buffer viewer does not (e.g. wire forks). Set your data size to something large (1 MByte or bigger) and single-step through your code, keeping an eye on the memory monitor. Every data copy will increment the memory monitor as the code executes. For Windows OSs, the memory monitor is Task Manager. For Mac OS X, it is Activity Monitor. For Linux, use top -os from the command line or one of its graphical variants. Now that you know how to see copies, what can you do to avoid them? Most of the following tips are contrary to good LabVIEW design practice, so National Instruments recommends you use them only when necessary. Use simple arrays. Extracting a data array for processing from waveform or dynamic data may make an extra copy. Use large block diagrams. Going into subVIs may generate a copy. When routing data through subVIs, make sure all front panel terminals are on the root of the block diagram, not inside a case statement, loop, or other container. The memory copy algorithms generate more copies if the terminals are not on the root. Avoid routing data through loops unless absolutely necessary. If you must route data through a loop, use shift registers. You may make a copy at the first iteration. If you use tunnels, you may make a data copy every time the loop iterates. Use In-Place Element Structures to avoid making copies when modifying arrays. The newer your copy of LabVIEW, the better off you are. LabVIEW 6i produces a data copy at almost every instance of the above list. LabVIEW 7.1 makes far fewer copies. See Also: Determining When and Where LabVIEW Creates a New Buffer LabVIEW Help: Using the Profile Performance and Memory Window Minimize Memory Problems with Chunking Since data copies are a fact of life with LabVIEW, you can minimize their impact by making the data size of your wires as small as possible. You can usually accomplish this by chunking. Chunking is breaking a large data set into smaller sets when transporting it from one place to another. This way, the copies LabVIEW makes do not adversely affect your memory usage. They do adversely affect your throughput, so minimizing them is still a good idea. The following example demonstrates this concept. You have a new digitizer with 256 MBytes of memory. You have just filled that memory and need to copy the data to disk. The easy way to do this is fetch it all, and then save to disk with a single call. Depending upon how well you do this, you will generate two or more extra copies of your data. On most computers, LabVIEW does not handle a request for 768 MBytes of memory very well. A better approach is to loop on getting 500 kBytes of data a time from the digitizer, and then stream that to disk. Your memory hit is now down to 1.5 MBytes, which is well within the limits of most computers. A side benefit is that you save the enormous amount of time it would take LabVIEW to allocate the large block of memory. Streaming 250 MBytes of data to disk should take 15 seconds or less on most modern computers. It could easily take LabVIEW that long just to allocate the 768 MBytes of RAM it would take to do this the easy way. Fast Data Display with Decimation In many interactive applications, the only thing you want to do with your data is show it to the user. There may be a real reason to display 5 million data points, but this amount of data is far beyond the capabilities of most displays. The average LabVIEW graph is on the order of 300 to 1000 pixels wide. Five million points is three orders of magnitude more than you can actually see on a waveform graph. Data decimation is the answer to this problem. The user of the graph would like to see a pixel accurate version of the huge data set. If the data is a single glitch in a 5 million point flat line, the pixel accurate plot is a horizontal line with a single spike one pixel wide. If the data is a sine wave with more cycles than the pixel width of the screen, the pixel accurate plot is a solid band across the screen no aliasing. A max-min decimation algorithm solves both of these use cases, and most others, quite well.
2/7
8/31/2009
Max-min decimation is decimation in which the maximum and minimum data points of each decimation interval are used to provide the decimation. Simple decimation uses the first point of each decimation interval for the data point of the decimation interval. Simple decimation leads to aliasing artifacts, so it should not be used unless time is of the utmost importance and accuracy is not important. To implement max-min decimation, first determine the pixel width of your graph. Use the Plot AreaSizeWidth property of the graph to query for this. To reduce artifacts, you need at least two decimation intervals per pixel width, so multiply the graph pixel width by two to get the nominal number of intervals to divide your data into. Divide your data length by this number and round up to the nearest integer. This gives you the chunk size for decimation. For each chunk, find the maximum and minimum points and order them in the same order they occurred in the data set. Don't worry that your last chunk has fewer points than the rest. The problem is less than a pixel wide and cannot be seen on the computer screen. String all the max and min data together and plot. You will have four points per pixel width on the screen. This allows the single, pixel-wide spike to occur without bleeding over into adjacent pixels. The max-min algorithm assures you always see the peaks of the data, giving you the solid band the high frequency sine wave should produce. This all occurs with much less data plotted to the graph, resulting in much higher speeds. However, you will need to manually set the X values of your X-axis using property nodes, because this method seriously changes the X-values of the data. Similarly, cursor operations should reference the original data. Here is an example. If you process the data in the left-hand image below using max-min decimation, it will produce the graph displayed on the right:
[+] Enlarge Image To see the speed improvements in action, open GLV_TypicalGenerateAndDisplay.vi from GigaLabVIEW.llb. Open your OS's memory monitor. Set the number of points to 1 million or more and run it. Note the execution time and memory use. Close the VI. Now open and run GLV_GigaLabVIEWGenerateAndDisplay.vi using the same number of points. Note the differences in time and memory usage. GLV_TypicalGenerateAndDisplay.vi generates the entire data set at once using the standard LabVIEW sine wave generator, and then plots it to the screen. GLV_GigaLabVIEWGenerateAndDisplay.vi generates the data in chunks, decimates these chunks, and throws the original data away. It also uses a lower level sine wave generator that generates a simple array. When the waveform datatype is used for the graph, the number of points is low enough that a copy is not very costly. Note that converting this VI from using the waveform datatype sine generator to using the lower level, simple array generator resulted in a 20% increase in speed in LabVIEW 7.0. Methods for Large Data Storage Sometimes, you need to store large data sets in memory. To do this without memory problems, you need a storage mechanism that allows you to save one copy of the data and access the data in chunks. This allows transport of the data without a large memory hit. One common solution to this problem is a functional global, which is also called shift register global or LabVIEW 2 global. Another solution is a single-element queue. To implement the functional global approach, create a VI that contains a single-pass loop, which is a FOR loop with one iteration or a WHILE loop hardwired to exit after one iteration. Use an uninitialized shift register as the means to hold your data. To read, write, and resize the data, use the array functions. These functions operate in place and do not cause data copies. It is usually a good idea to store start and increment values in other shift registers in the same loop. You can also cache the number of points the same way. You can create the functionality of a C/C++ pointer to buffer using such a VI. Make the database VI reentrant. To use it, start it with the VI server. This produces a reference to the VI, which can be passed anywhere. If you want another buffer, start another one with the VI server. Since the VI is reentrant, another instance will be created. You can now
3/7
8/31/2009
pass this VI reference around and use it anywhere. Note that use of a VI reference will cause any access of the data to use the UI thread. Using the UI thread for data access slows program execution. You can get around this problem by creating a different VI for every buffer and making each VI nonreentrant. Implementing the queue approach is just as easy. Create a queue with a single element that contains your data. Any time you want to access the data, dequeue the element. This will block other parts of your program from simultaneous access (a good thing). Do your data operation, and then enqueue the element again. The only thing you need to pass around is the queue reference. In fact, if you name the queue, you can get a reference to it easily at any point by specifying the name in the Obtain Queue primitive. Creating multiple data objects is as easy as creating multiple queues. This approach is typically faster than the functional global approach. GLV_WaveformBuffer.vi in GigaLabVIEW.llb is an example of the functional global concept. To see the concept in action, open GLV_TypicalMemoryStoreAndBrowse.vi. Set the number of points to 100 thousand or more and run the VI. Now open GLV_GigaLabVIEWMemoryStoreAndBrowse.vi and do the same. Note the difference in responsiveness and memory usage. Increase the number of points to one million or more to see major differences. GLV_GigaLabVIEWMemoryStoreAndBrowse.vi uses the shift register database in GLV_WaveformBuffer.vi to hold the data. The database also holds a frame buffer a pre-decimated array of the full data that can be reused without recalculating. It also uses the chunking and display algorithms already introduced in the previous example. GLV_GigaLabVIEWMemoryStoreAndBrowseQ.vi in GigaLabVIEW.llb is an example of the single-element queue concept. It uses standard object-oriented techniques to create the queues and manipulate them. Use is identical to GLV_GigaLabVIEWMemoryStoreAndBrowse.vi. The different shift registers in GLV_WaveformBuffer.vi are implemented as separate queues for this example. Which method you use is largely a matter of taste and programming style. Now that you know how to create large data sets, how much memory can you expect to allocate? The answer depends on several factors. When LabVIEW allocates an array, it requests a contiguous memory section. If your memory is fragmented, you may get an out-of-memory error even though you still have hundreds of megabytes of free memory. You can work around this somewhat by allocating your data in chunks. If you write your repository VI(s) correctly, your access to it should not change. LabVIEW uses signed 32-bit memory access, so your total memory will never be over 2 GBytes. Windows OSs use unsigned 32-bit memory access, but reserve the high 2 GBytes for system use and allow program execution only in the low 2 GBytes (server versions are more flexible). In addition, system DLLs occupy most of the high quarter of the 2 GByte user data space. Thus, a practical limit on a Windows system is 1.0 - 1.5 GBytes. Different versions of LabVIEW fragment memory in different ways. This changes the maximum array size you can allocate. On LabVIEW 7.x, you can typically allocate slightly more than 1 GByte in a single array. LabVIEW 8.x, due to its larger feature set, only allows a maximum array size of about 800 MBytes. Breaking the 2 GB File Size Barrier (LabVIEW 7.1 and earlier) LabVIEW 8.0 introduced 64-bit file pointers, so the techniques in this section are not needed. Earlier versions of LabVIEW use 32-bit, signed integers for file pointers. This directly limits the addressable size of a file to 2 GBytes. Streaming to disk at 10 MBytes/sec, which is easily done with NI digitizers, fills the 32-bit signed integer space in 3 minutes and 20 seconds. There are two direct options to overcome this problem. The first is fairly simple. If you are using a Windows operating system with an NTFS formatted disk partition (must be Windows NT, 2000, or XP), you can simply write to disk using the LabVIEW write primitive. Do not wire anything to the offset and wire the position mode to current. You can write until you run out of disk space. For a simple example of this, open GLV_StreamToDisk.vi in GigaLabVIEW.llb. This example saves an ascending sequence of double precision floating point numbers to disk. Set the amount of data on the front panel, run, and then sit back and relax while it stores data. The default chunk size, 65,000 bytes, was experimentally determined to be the speed optimum for Windows based systems. To read the data back, reverse the process. Use the read primitive with no offset input. You can use the offset up to the 2 GByte boundary. If you use it outside that boundary, you will get an end-of-file error. However, if you simply read data sequentially from disk, you can read until the end of the file. The VI GLV_ReadFromDisk.vi shows this process. This trick works only for the Windows OSes mentioned above. In addition, it is not possible to seek to an arbitrary location above the 2 GByte boundary in the file and read the data there. This brings us to the second option use a 64-bit file utility with LabVIEW. A good example of this is HDF5. HDF5 is a
4/7
8/31/2009
binary, hierarchical file utility designed, written, and maintained by the National Center for Supercomputing Applications (NCSA). It is free for any sort of use, since it is funded by the US government. For full information, source code, and binaries of HDF5, visit http://hdf.ncsa.uiuc.edu/HDF5/. Using HDF5, or any other 64-bit utility, requires the ability to pass 64-bit numbers to the utility. This brings us to our last topic. Interfacing with 64-Bit DLLs (LabVIEW 7.1 and earlier) LabVIEW 8.0 introduced full support for 64-bit integers, so the techniques in this section are not needed. For earlier versions, there are two options for interfacing with a 64-bit DLL. The first is to write a C/C++ wrapper, which only exposes data structures that LabVIEW can natively handle. Since this defeats the ease-of-use of LabVIEW, we will discuss the second option - use the call library node and access the DLL directly, using a bit of digital sleight-of-hand. You can represent a 64-bit number by a cluster of two 32-bit numbers. Mathematical operations between 64-bit numbers can be coded using well-known algorithms for arbitrary precision arithmetic. These algorithms are beyond the scope of this paper, but you can easily find them on the web. In addition, these algorithms can also be found in the book The Art of Computer Programming, Volume 2: Seminumerical Algorithms by Donald Knuth.
Now that you have a method to represent and do math on 64-bit numbers, how do you get them to the DLL? The easiest way is to typecast 64-bit entities to doubles, and then pass in a double whenever the DLL asks for a 64-bit integer. Since the typecast does a binary image transform, similar to a union in C, all will be well, provided you have the high and low order double words in the proper order. The graphic at right shows the proper order and the cast to a double. The call library node will take care of byte ordering for the particular platform. This method also works for arrays and will even give the right padding on architectures that require it, such as SPARC. This is because the double is a 64-bit entity. So if the DLL has a function prototype of: int32 fooFunc(uint64 length, uint64 *elements) the prototype you create in the call library node looks like long fooFunc(double length, double *elements). You cannot use this trick to get the return value of a function. If the function you wish to use has a prototype uint64 barFunc(void) then older versions of LabVIEW have no way to access the full return value. LabVIEW can only get the bottom 32 bits because function return values are returned in the registers of the processor while items in the call list are returned in the program stack. On the stack, the only thing that matters is that LabVIEW and the DLL are using the same sized object. For function return values, integer and floating point values are returned in different registers. LabVIEW has no way of accessing the top 32 bits of a returned 64-bit integer. A C/C++ wrapper is necessary. Using the above example, the wrapper is of the form void barFuncWrapper(uint64 *barFuncData){ *barFuncData = barFunc(); return; } Fortunately, this is usually not necessary. Two examples of interfacing to 64-bit DLLs are available in NI products both interface to HDF5.
5/7
8/31/2009
The HWS file utility is a C/C++ wrapper that was designed with LabVIEW interfacing in mind. Since HDF5 is very lowlevel and difficult to master, the HWS API puts a standard LabVIEW file I/O interface over the HDF5 complexity. HWS is currently available with the NI-HSDIO, NI-SCOPE, and NI-FGEN drivers, the Analog and Digital Waveform Editors, and any DriverCD dated August 2004 and later. The sfpFile utility set is a LabVIEW utility that interfaces directly to HDF5 with as few C wrappers as possible. It is available from ni.com, but is not supported by National Instruments. It embodies the principles of direct use of a 64-bit DLL from LabVIEW. Two example VIs from this utility set are included in GigaLabVIEW.llb. The first is H5Screate_simple.vi, which is a direct call to the HDF5 DLL with prototype int32 H5Screate_simple(int32 rank, const uint64 *dims, const uint64 *maxdims). The LabVIEW call library node prototype is long H5Screate_simple(long rank, double *dims, double *maxdims). The second is DU64_DBLToDU64.vi, an example of how to convert a double precision floating point number into a cluster of two 32-bit integers, and then cast back into a double for passing to the HDF5 routines. Doubles provide a convenient method of keeping track of large file pointer integers in LabVIEW since they have 52 digits of precision. Since NTFS only has a 48-bit data space, this works well. Addition, subtraction, and multiplication of integer valued floating point numbers are usually exact operations. Note that HWS and sfpFile produce the same file format. It is just the API that differs. Sample Code GigaLabVIEW.llb contains all the sample code mentioned in the tutorial. If you wish, you may also download the HDF5 DLLs to prevent the HDF5 examples from looking for nonexistent libraries. Place them in your system directory. Related Links: LabVIEW Windows Routines for Data Compression HDF5 Home Page Can I Edit and Create Hierarchical Data Format (HDF5) files in LabVIEW? (Download sfpfile) Determining When and Where LabVIEW Creates a New Buffer (Download Buffer Viewer) Downloads giga_labview.llb hdf5_2_lv.dll hdf5dll.dll hdf5_copyright.htm Reader Comments | Submit a comment
Legal This tutorial (this "tutorial") was developed by National Instruments ("NI"). Although technical support of this tutorial may be made available by National Instruments, the content in this tutorial may not be completely tested and verified, and NI does not guarantee its quality in any way or that NI will continue to support this content with each new revision of related products and drivers. THIS TUTORIAL IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND AND SUBJECT TO CERTAIN RESTRICTIONS AS MORE SPECIFICALLY SET FORTH IN NI.COM'S TERMS OF USE (http://ni.com/legal/termsofuse/unitedstates/us/).
My Profile | RSS | Privacy | Legal | Contact NI 2009 National Instruments Corporation. All rights reserved.
E-Mail this Page
6/7
8/31/2009
7/7

Managing Large Data Sets in LabVIEW - Developer Zone - National Instruments

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Managing Large Data Sets in LabVIEW - Developer Zone - National Instruments

Hochgeladen von

Copyright:

Verfügbare Formate

8/31/2009

Improve your ni.com experience. Login or Create a user profile.