0 Bewertungen0% fanden dieses Dokument nützlich (0 Abstimmungen)
257 Ansichten2 Seiten
Complex applications today involve multiple processes, multiple threads of control, distributed processing, thread pools, event handling, messages. The typical approach is to trace the behavior of the systems and track how the different incoming messages are processed throughout the system. This paper shows how dynamic analysis can be used to automatically identify the transactions, stages and shared queues in Java programs.
Originalbeschreibung:
Originaltitel
Automatic detection of queues and event handlers in message processing systems
Complex applications today involve multiple processes, multiple threads of control, distributed processing, thread pools, event handling, messages. The typical approach is to trace the behavior of the systems and track how the different incoming messages are processed throughout the system. This paper shows how dynamic analysis can be used to automatically identify the transactions, stages and shared queues in Java programs.
Copyright:
Attribution Non-Commercial (BY-NC)
Verfügbare Formate
Als PDF, TXT herunterladen oder online auf Scribd lesen
Complex applications today involve multiple processes, multiple threads of control, distributed processing, thread pools, event handling, messages. The typical approach is to trace the behavior of the systems and track how the different incoming messages are processed throughout the system. This paper shows how dynamic analysis can be used to automatically identify the transactions, stages and shared queues in Java programs.
Copyright:
Attribution Non-Commercial (BY-NC)
Verfügbare Formate
Als PDF, TXT herunterladen oder online auf Scribd lesen
Automatic detection of internal queues and stages in message processing systems
Suman Karumuri, Steve Reiss, Brown University. {suman,spr@cs.brown.edu}
Abstract ing. Here there is a single stage, processing a URL, and
a single queue through which the worker threads exchange Complex applications today involve multiple processes, messages containing URLs. Some other examples of MPSs multiple threads of control, distributed processing, thread we have looked at are a peer to peer system, the Haboob pools, event handling, messages. The behaviors and misbe- HTTP server, the Hadoop Map-Reduce framework, a back haviors of these nondeterministic, message-based systems end for code search, a linguistic search web service, the en- are difficult to capture and understand. The typical ap- terprise messaging system ActiveMQ, the Rupy and Jetty proach is to trace the behavior of the systems and track how HTTP servers and Jabber tools JBother and Openfire. the different incoming messages are processed throughout The best way to understand the behavior of a MPS is by the system. While messages between processes can be cap- analyzing a trace of its behavior, a trace that can be both pre- tured automatically at the network or library level, tracing cise and that can be generated with relatively low overhead. the message processing within a system, which is often more However, such traces are difficult to obtain and thus are used complex and error-prone, requires the programmer to man- only as a last resort. The main problem here is that procur- ually instrument the code by identifying the different mes- ing such a trace requires significant work on the part of the sage handlers, thread states, processing stages, and shared programmer since the code to generate the trace needs to be queues accurately and completely. In this paper we show inserted manually in many portions of the system. Here it how dynamic analysis can be used to automatically iden- is easy to overlook a queue or processing stage and to insert tify the transactions, stages and shared queues in Java pro- bugs into the system along with the instrumentation code. grams as a prelude to trace-based comprehension. The manual instrumentation of a message processing sys- tem is carried out in two steps, identifying the transactions, stages and queues in the MPS, and manually adding trace 1 Introduction code in the form of instrumentation based on these identi- fications. While frameworks like XTrace[6] ease the latter Today’s complex systems typically handle a series of ex- step, no tools currently exist to automate the former step. ternal requests or messages as a whole or by splitting the Prior work in the area of automatic trace generation of requests into separate computational steps and processing message processing systems has concentrated on the exter- each of these stages using mechanisms such as thread pools. nal messages rather than the internals of the systems. These The overall behavior of such systems can be nondeterminis- systems capture and interpret the data at well defined in- tic, unpredictable, and difficult to understand. We call such terfaces like network sockets (Causeway[2]), Unix system systems message processing systems (MPS), call the re- calls (BorderPatrol[4]), the libevent entries (WhoDunIt[3]) quests transactions, call the various processing steps stages, or J2EE calls (Pinpoint [5]). Since these interfaces are stan- and call the mechanism used to save and allocate stages to dard and well documented, the queues and stages in those the different threads queues. In this paper, we present new systems can be readily identified. These techniques do not program analysis techniques to detect queues, stages and work on the various internal queues and threading mecha- transactions in a message processing system using minimal nisms that most complex MPSs utilize. programmer input, the list of messages in a system. We are developing a system that attempts to completely As an example of a MPS, consider a simple web crawler automate the process of tracing message processing systems we wrote for a class project. The crawler uses a pool of to facilitate understanding their behavior. In this paper we threads to retrieve a queue of pages. Each thread takes a describe our approach to the problem of automatically de- page off the queue and processes it in the following order: termining what instrumentation is needed to generate ap- check if the URL is a valid, download the HTML page if propriate traces. Our approach uses minimal programmer the URL is valid and finally add the new URLs found in the input and a combination of static and dynamic analysis to downloaded page to the shared queue for further process- obtain the appropriate information. 2 Identifying internal queues and stages sage objects. The shared queues are identified by a python program that walks the trace to match the corresponding Our first step in automating the instrumentation process set/add and get/remove calls from different threads using was to understand the common techniques used in MPS. By message IDs. understanding what the programmer would need to know Our definition of a stage matches the definition of an to instrument the wide range of message processing sys- event handler by Reiss [7]. We thus use their dynamic event tems (described in previous section), we looked at and de- handler detection system to identify various stages in the termined we could assume the following traits: (a) All mes- system. All stages that created a message without retriev- sages in the system can be identified by a small set of Java ing a message were identified as a start stage and all stages types. (b) Each message in the system can be uniquely iden- that retrieved a message but didn’t insert new messages for tified. (c) The shared queues are implemented using Java further processing were identified as end stages of the trans- collections containing objects of type messages or by a field action. All other stages are treated as intermediate stages. It (queue of size 1) of type message. (d) Each stage waits for should be noted that the same stage may act as an end stage the next event that it can process from the queue. Once it in one transaction and as an intermediate stage in another finds an event, it removes the event from the queue, pro- because of conditional branching. cesses it and waits for another event. (e) The start stage of a transaction always creates a message for the next stage to 3 Results and Conclusions process, without retrieving a new message. The last stage of every transaction retrieves a message from a bin but doesn’t To evaluate the effectiveness of our analysis, we used our insert any messages into the shared bin for further process- prototype system to automate automatically identify queues ing. (f) Every stage, except the starting stage of a trans- and event handlers in our web crawler and in Rupy[1], an action, extracts a message from the shared queue and pro- open source HTTP server. The results of the analysis were cesses it. While processing the message, the stage may up- compared against the results obtained through manual code date the same message or may create a new message based inspection and were found to be identical. The analysis was on existing message, that is inserted into the shared queue also part of an automatic causal graph generator (under con- for further processing. struction), using which we were able to generate a simple While most message processing systems will satisfy causal graph of the execution of the crawler, another indica- these assumptions, it should be relatively easy to augment tion that the analysis works. non-conforming systems either manually or automatically. In this paper, we have presented new program analysis For example, if a system lacks a unique message identity, techniques to automatically detect transactions, stages and either a new field could be inserted in the message objects queues in a MPS. We have also shown that these techniques or the object hash code could be used as an approximate are accurate by evaluating them on real world systems. We identity for each message. are currently working on using these techniques along with Our system requires two simple inputs from the pro- others to build a complete system for understanding mes- grammer. The first is the name of the Java class or classes sage processing applications based on automated trace gen- that represent messages in the system. The second is a sam- eration and analysis. ple run of the program from which we can obtain dynamic information. Using these, our system identifies the shared References bins as those fields that contain a message or a Java collec- tion of messages and that have corresponding get-set oper- [1] http://code.google.com/p/rupy/. ations (implicit or explicit) that are performed by different [2] A. e. a. Chanda. Causeway: operating system support for con- threads ignoring initializations. Initializations typically in- trolling and analyzing the execution of distributed programs. volve fields local to the worker threads that are initialized In HOTOS’05, pages 18–18. [3] A. C. et al. Whodunit: transactional profiling for multi-tier once from the main thread. To identify and thus ignore such applications. In EuroSys ’07, pages 17–30. fields, we only identify those fields as queues when they are [4] E. K. et al. Borderpatrol: isolating events for black-box trac- used to exchange more messages than the maximum num- ing. In Eurosys ’08, pages 191–203. ber of threads in the JVM at any point of time. To imple- [5] M. Y. C. et al. Pinpoint: Problem determination in large, dy- ment this analysis, we used AspectJ to trace all get and set namic internet services. In DNS’02, pages 595–604. calls on the fields that contain message objects and all add [6] R. F. et al. X-trace: A pervasive network tracing framework. and remove operations on collections 1 that hold the mes- In NSDI’07. [7] S. P. Reiss. Dynamic detection of event handlers. In Proceed- 1 Since the interface to add and remove elements is not the same for all ings of WODA ’08, pages 1–7. the collections, we have segregated relevant Java Collection API calls into add or remove calls. If a collection contains Java Objects (unspecialized collection) we only trace operations on objects of type messages.