Sie sind auf Seite 1von 11

Byte Code Engineering

Markus Dahm
Freie Universit t Berlin a dahm@inf.fu-berlin.de

Abstract. The term Java is used to denote two different concepts: the language itself and the related execution environment, the Java Virtual Machine (JVM), which executes byte code instructions. Several research projects deal with byte code-generating compilers or the implementation of new features via byte code transformations. Examples are code optimization, the implementation of parameterized types for Java, or the adaptation of run-time behavior through load-time transformations. Many programmers are doing this by implementing their own specialized byte code manipulation tools, which are, however, restricted in the range of their reusability. Therefore, we have developed a general purpose framework for the static analysis and dynamic creation or transformation of byte code. In this paper we present its main features and possible application areas.

1 Introduction
Many research projects deal with extensions of the Java language [13] or improvements of its run-time behavior. Implementing new features in the Java execution environment (Java Virtual Machine, JVM) is relatively easy compared to other languages, because Java is an interpreted language with a small and easy-to-understand set of instructions (the byte code). The JAVA C LASS API which we present in this paper is a framework for the static analysis and dynamic creation or transformation of Java class les.1 It enables developers to deal with byte code on a high level of abstraction without handling all the internal details of the Java class le format. There are many possible application areas ranging from class browsers, prolers, byte code-optimizers, and compilers, to sophisticated run-time analysis tools and extensions to the Java language [1,21,3]. Other possibilities include the static analysis of byte code [22], automated delegation [8], or implementing concepts of Aspect-Oriented Programming [16]. We think that the most interesting application area for JAVA C LASS is meta-level programming, i.e. load-time reection [18], which will be discussed in detail in section 3.1. Our approach provides a truly object-oriented view upon Java byte code. For example, code is modeled as a list of instructions objects. Within such a list one may add or delete instructions, change the control ow, or search for certain patterns of code using regular expressions. We assume the reader to have some basic knowledge about the JVM and Java class les. A more detailed introduction to the API and the Virtual Machine can be found in
1

The JAVA C LASS distribution, including several code examples and javadoc manuals, is available at http://www.inf.fu-berlin.de/dahm/JavaClass/index.html.

[9]. The paper is structured as follows: We rst give a brief overview of related work and present some aspects and technical details of the framework in section 2. We then discuss concepts of byte code engineering and possible application areas in section 3 and conclude with section 4.

1.1 Related work The JOIE [7] toolkit can be used to augment class loaders with dynamic behavior. Similarly, Binary Component Adaptation [15] allows classes to be adapted and evolved on-the-y. Han Lees Byte-code Instrumenting Tool [17] allows the user to insert calls to analysis methods anywhere in the byte code. The Jasmin assembler [20] can be used to compile pseudo-assembler code. Kawa, a Java-based Scheme system, contains the gnu.bytecode package [5] to generate byte code. The metaXa Virtual Machine [12] allows to dynamically reify meta level events, e.g. instance eld access. In contrast to these projects, JAVA C LASS is intended to be a general purpose tool for byte code engineering. It gives the developer full control on a high level of abstraction and is not restricted to any particular application area.

2 The JavaClass framework


The JAVA C LASS framework consists of a static and a generic part. The former is not intended for byte code modications. It may be used, e.g., to analyze Java classes without having the source les at hand. The latter supplies an abstraction level for creating or transforming class les dynamically. It makes the static constraints of Java class les, like hard-coded byte code addresses, mutable. Using the term generic here may be a bit misleading, we should perhaps rather speak of a generating API. UML diagrams unfortunately too large for this paper describing the class hierarchy of the framework can be found in [9].

2.1 Static API All of the binary components and data structures declared in the JVM specication [19] are mapped to classes, where the top-level class is called JavaClass, giving the whole API its name. Instances of this class basically consist of a constant pool, elds, methods, symbolic references to the super class and to the implemented interfaces of the class. At run-time, these objects can be used as meta objects describing the contents of a class. This possibility will be discussed in detail in section 3.1. The constant pool serves as a central repository of the class and contains, e.g., entries describing the type signature of methods and elds. It also contains String, Integer, and other constants. Indexes to the constant pool may be contained in byte code instructions as well as in other components of a class le and in constant pool entries themselves.

Analyzing classes. Information within the class le components can be accessed via an intuitive set/get interface. Compilers may use the framework to analyze binary class les, e.g. in order to check whether they contain certain elds or methods. Using the provided class repository, implementing a simple class viewer is quite easy:
JavaClass clazz = Repository.lookupClass("java.lang.String"); System.out.println(clazz); // Print class contents Method[] methods = clazz.getMethods(); for(int i=0; i < methods.length; i++) { System.out.println(methods[i]); // Print method signature Code code = methods[i].getCode(); // Print byte code of if(code != null) // non-abstract, System.out.println(code); // non-native methods }

JAVA C LASS supports the Visitor design pattern [10], i.e. it allows developers to write their own visitors to traverse and analyze the contents of a class le. Included in the distribution, e.g., is a class JasminVisitor that converts class les into the Jasmin assembler language [20]. 2.2 Generic API This part of the API makes it possible to modify byte code components dynamically. The generic constant pool, for example, implemented by the class ConstantPoolGen, offers methods for adding different types of constants. Accordingly, ClassGen offers routines to add or delete methods, elds, and class attributes. Representation of types. We abstract from the concrete details of type signature syntax of the JVM [19] by introducing the Type class, which is used, for example, to dene the types of methods. Concrete sub-classes are BasicType, ObjectType and ArrayType. There are also some predened constants for common types. For example, the type signature of the main method is represented by:
Type ret_type = Type.VOID; Type[] arg_types = new Type[] {new ArrayType(Type.STRING, 1)};

Generic elds and methods. Fields are represented by FieldGen objects. If they have the access rights static final, i.e. are constants, they may optionally have an initializing value. MethodGen objects contain routines to add local variables, thrown exceptions, and exception handlers. Because exception handlers and local variables contain references to byte code addresses, they also take the role of an instruction targeter in our terminology. Instruction targeters contain a method updateTarget() to redirect such references. The code of methods is represented by instruction lists that contain instruction objects. References to byte code addresses are implemented by handles to instruction objects. This is explained in more detail in the following sections.

Instruction objects. Modeling instructions as objects may look somewhat odd at rst sight, but in fact enables programmers to obtain a high-level view upon control ow without handling details like concrete byte code addresses. Instruction objects basically consist of a tag, i.e. an opcode and their length in bytes. The instruction set of the Java Virtual Machine distinguishes its operand types using different instructions to operate on values of specic type. For example iload loads an integer value onto the stack, while fload loads a oat value. Our approach enables us to group instructions via sub-classing. For example, there are branch instructions like goto and if icmpeq, which compares two integers for equality. They additionally contain an address (offset) within the byte code as the branch target. Obviously, this makes them candidates for playing an instruction targeter role, too. For debugging purposes it may even make sense to invent your own instructions. In a sophisticated code generator, e.g., it may be very difcult to track back in which method particular code has actually been created. One could think of a specialized nop (No operation) instruction (which may be inserted anywhere without effect) that contains additional debugging information. One could also think of new byte code instructions operating on complex numbers that are replaced by normal byte code upon loadtime or are recognized by a new JVM. We will not list all byte code instructions here, since these are explained in detail in the JVM specication [19]. The opcode names are mostly self-explaining, so understanding the following code examples should be fairly intuitive. 2.3 Instruction lists An instruction list is implemented by a list of instruction handles encapsulating instruction objects. References to instructions in the list are thus not implemented by direct pointers to instructions but by pointers to instruction handles. This makes appending, inserting and deleting areas of code very simple. Since we use symbolic references, computation of concrete byte code offsets does not need to occur until nalization, i.e. until the user has nished the process of generating or transforming code. Instruction handles can be used as some kind of symbolic addresses, for example in order to dene the scope of an exception handler:
InstructionHandle start, end, handler; ... // Define start and end of handled area, and handler code ObjectType ot = new ObjectType("java.io.IOException"); mg.addExceptionHandler(start, end, handler, ot);

We will use the terms instruction and instruction handle synonymously for the rest of the paper. Appending instructions. Instructions can be appended before or after any given handle within a list. The default is to append the instruction to the end of the list. All append methods return a new instruction handle which may then be used as the target of a branch instruction, e.g..

InstructionList il = new InstructionList(); GOTO g = new GOTO(null); // Target of branch not known yet il.append(g); // Append it to the end of the list ... g.setTarget(il.append(new NOP())); // Set branch target

Deleting instructions. Deletion of instructions is also very straightforward: all instruction handles and the contained instructions within a given range are removed from the instruction list and disposed. The delete() method may throw a TargetLostException when there are instruction targeters still referencing one of the deleted instructions. The user is forced to handle such exceptions and redirect those references where the necessary informations are stored in the exception object.
try { il.delete(start, end); } catch(TargetLostException e) { ... }

2.4 Instruction factories When producing byte code, some patterns typically occur very frequently, for instance the compilation of arithmetic or comparison expressions. One certainly does not want to copy the code that translates such expressions to every place they may appear. Instead, we supply instruction factories [10] and compound instructions. Instances of the latter may be appended just like normal instructions. For example, pushing constants onto the operand stack may be coded in different ways. There are some short-cut instructions that can be used to make the produced byte code more compact. The smallest instruction to push a single 1 onto the stack is iconst 1, other possibilities are bipush (can be used to push values between -128 and 127), sipush (between -32768 and 32767), or ldc (load constant from constant pool). Instead of repeatedly selecting the most compact instruction in, say, a switch, one can use the compound PUSH instruction whenever pushing a constant number or string. It will produce the appropriate byte code instructions when added to an instruction list and automatically insert entries into the constant pool if necessary.
InstructionList il = new InstructionList(); il.append(new PUSH(constant_pool, "Hello, world")); il.append(new PUSH(constant_pool, 4711));

2.5 Representing exception handlers and local variables In Figure 1 we present how the code of the readInt() method is mapped to an instruction list. The local variables n and e both hold two references to instructions, dening their scope. An exception handlers is displayed, too: it references the start and the end of the try block and also the handling code. Altogether, there are three kinds of instruction targeters: local variables, exception handlers and branch instructions.

private static BufferedReader in = new BufferedReader(new InputStreamReader(System.in));


star t

sipush 4711 istore n getstatic in


rt sta

public static final int readInt() { int n = 4711; try { n = Integer.parseInt(in.readLine()); } catch(IOException e) { System.err.println(e); } ... return n; }

Local var
istore n int n goto astore e getstatic out aload e
end

end

Exception handler

ha nd
sta rt

ler

Local variable IOException e


en d

invokevirtual goto

iload n ireturn

Fig. 1. Instruction list for readInt() method 2.6 Code patterns When transforming code, for instance during optimization or when inserting analysis method calls, one typically searches for certain patterns of code on which to perform the transformation. In order to handle such situations JAVA C LASS includes a novel feature: One can search for given code patterns within an instruction list using regular expressions. In regular expressions, instructions may be represented by symbolic names, e.g. IfInstruction or ILOAD . Meta symbols like +, *, and (..|..) have their usual meanings. Currently we are using string literals to represent regular expressions, a future version of the API may introduce a more advanced data structure. Additional constraints to the matching area of instructions, which can not be implemented in terms of regular expressions, may be expressed via code constraints. Section 2.8 presents an example for the usage of this feature. 2.7 Example I: Compiling an if statement The following example shows how a simple compiler would translate the Java statement
if(a == null) a = b;

into byte code:


InstructionList il = new InstructionList(); IfInstruction i = new IFNONNULL(null); il.append(new ALOAD(0)); // Load local variable 0(a) on stack il.append(i); // Use negated condition il.append(new ALOAD(1)); // Load local variable 0(b) on stack il.append(new ASTORE(0)); // Store in a // Define auxiliary target for the case a != null i.setTarget(il.append(new NOP());

2.8 Example II: Peep hole optimizer In Java, boolean values are mapped to 1 and 0. Thus, the simplest way to evaluate boolean expressions like a == null during compilation is to push a 1 or a 0 onto the operand stack. But this way, the subsequent combination of boolean expressions (with &&, e.g) yields long chunks of code that push lots of 1s and 0s onto the stack. Additionally, one has to add a lot of nop operations as auxiliary branch targets. Such code chunks can be optimized using a peep hole algorithm [2]: An IfInstruction, that either produces a 1 or a 0 on the stack and is followed by an ifeq instruction (branch if stack operand equals to 0) is replaced by the negated IfInstruction with its branch target replaced by the target of the ifeq instruction. Figure 2 illustrates the results of the algorithm: The left column shows the original Java expression, the middle one the naive translation into byte code, and the right one shows an optimized version.
if(a == null) a = b; 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: aload_0 ifnull #5 iconst_0 goto #6 iconst_1 nop ifeq #10 aload_1 astore_0 nop 1: 2: 3: 4: 5: aload_0 ifnonnull #5 aload_1 astore_0 nop

Fig. 2. Optimizing byte code This algorithm can be implemented like this:
FindPattern f = new FindPattern(il); f.search("IfInstructionICONST_0GOTO" + "ICONST_1NOPIFEQ", constraint); InstructionHandle[] match = f.getMatch(); IfInstruction if_ = (IfInstruction)match[0]. getInstruction().negate(); // Negate instruction match[0].setInstruction(if_); if_.setTarget(match[5].getTarget()); try { il.delete(match[1], match[5]); } catch(TargetLostException e) { ... }

// Replace instruction // Update branch target // Remove obsolete code // Update targeters

Subsequent application of this algorithm removes all unnecessary stack operations and branch instructions from the byte code. If any of the deleted instructions is still referenced by an instruction targeter object, the reference needs to be updated in the catch-clause.

Code constraints. The above applied code constraint object ensures that the matched code really can be transformed, i.e. it checks the targets of the branch instructions:
CodeConstraint constraint = new CodeConstraint() { public boolean checkCode(InstructionHandle[] m) { IfInstruction if_ = (IfInstruction)m[0].getInstruction(); GOTO g = (GOTO)m[2].getInstruction(); return (if_.getTarget() == m[3]) && (g.getTarget() == m[4]); } };

3 Concepts of Byte Code Engineering


Byte code engineering techniques can be used in many application areas. We think that the most interesting application area is the user-transparent adaptation of existing code at run-time. For example, a proling tool can dynamically insert calls to analysis methods without affecting the semantics of the original code. There are also cases where one has to adapt classes for a certain environment when the source code is not available. 3.1 Load-time Reection The terms reection or meta-level programming are generally used to denote systems that have the ability to reason about themselves, where certain aspects of the system are reied as meta objects [6]. On the one hand, separating meta-level from base-level code can separate concerns, e.g., security or persistence aspects can be addressed in the metaprogram, thus enhancing base-level code reusability. On the other hand, reection can provide more exibility and better adaptability to changing environments. One prominent application area for reection are programming tools like browsers, debuggers, and prototyping environments. Java itself provides an API for structural run-time reection [14], which can be used to retrieve type information about objects, to dynamically invoke methods, or to access elds of a given object. Barat [4], in contrast, is an approach for compile-time reection. Using class loaders and JAVA C LASS, one can add an additional level of reection with Smalltalk-like features [11] to Java: load-time reection. Class loaders are responsible for loading class les from the le system or other resources and passing the byte code to the Virtual Machine [18]. Custom ClassLoader objects may be used to replace the standard procedure of loading a class. A given runtime system can then access JAVA C LASS meta-level objects created at load-time and adapt them to its needs, or even create them ad hoc without a source le. This is an elegant way of extending the Java Virtual Machine without actually modifying it. Similar to metaXa [12], for example, it is possible to reify events like method calls, instance creations or eld accesses, but without the need to change the Virtual Machine. The reication of such an event can be implemented by simply enclosing or replacing the actual byte code instruction with invocations of runtime system methods.

A possible scenario is described in gure 3: During run-time the Virtual Machine requests a custom class loader to load a given class. Before the JVM actually sees the byte code, the class loader makes a side-step to create the requested class or to perform some transformation to an existing class. To make sure that the modied byte code is still valid and does not violate any of the JVMs rules it is checked by the verier before the JVM nally executes it.

Java class file

Class loader

Byte code verifier

Interpreter/JIT

JavaClass Byte code transformations

Fig. 3. Class loaders

3.2 Example III: Reloading classes From the viewpoint of a run-time system it is sometimes desirable to reload a given class, i.e. to drop the old byte code and load a new implementation. This is not possible with the JVM, of course, but we can simulate the effect using byte code transformations. We illustrate this with the following simple example:
public class ReloadMe { public ReloadMe() { ... } public String foo(int bar) { ... } }

We then implement a class loader that globally replaces all instance creations of this class with calls to an object factory. I.e., all statements like o = new ReloadMe(); become o = Factory.createReloadMe(); (on the byte code level). The factory is initialized with the class object of ReloadMe and uses it as a template for object creation (if the constructor takes arguments we have to use Constructor objects of the Java Reection API as templates).
public class Factory { Class reload_me = ReloadMe.class; ... public static ReloadMe createReloadMe() { return (ReloadMe)reload_me.newInstance(); } }

When the run-time system decides to load a new implementation of ReloadMe it simply replaces the template class of the factory with a new class that extends the

original one. This new class is generated using the JAVA C LASS API and passed to the class loader:
JavaClass clazz = generateNew("ReloadMe"); Factory.reload_me = class_loader.load(clazz.getBytes());

Going even further we can reimplement the behavior of single methods, too, if we insert a hook at the start of every method whose behavior we want to adapt. The hook is inactive when it has a null value. Otherwise it redirects all incoming calls, i.e. we use delegation [10].
public String foo(int bar) { if(hook != null) return hook.foo(this, bar); ... // Old code becomes unreachable }

It would be awkward to incorporate this explicit reloading and delegation code into the source of the original class. But on the meta (byte code) level we can easily implement user-transparent behavorial reection features like the ones described. Another useful example for the transformation of new statements may be to avoid costly thread creation operations by replacing them with factory methods that reuse old thread objects. This is also a good example of how to add optimizations on the meta level that are orthogonal to the original program and thus do not affect its semantics.

4 Conclusion and Future Work


In this paper we have presented the JAVA C LASS framework, a general purpose tool for byte code engineering. It provides the developer with a completely object-oriented view upon byte code and allows him to conveniently analyze, transform or create classes. In combination with class loaders, JAVA C LASS is a powerful tool to implement reectional features on a meta level not visible to the user. There are many possible application areas ranging from class browsers, prolers, byte code-optimizers, and compilers to sophisticated run-time analysis tools and extensions to the Java language. The framework has already proved its usefulness in several projects. We have been able, e.g., to implement the byte code-generating back end of a Java compiler within ten days. We plan to further enhance JAVA C LASS with more support for load-time reection and control ow analysis on the byte code level.

References
1. O. Agesen, S. N. Freund, and J. C. Mitchell. Adding Type Parameterization to the Java Language. In Proceedings OOPSLA97, Atlanta, GA, 1997. 2. A. Aho, R. Sethi, and J. Ullman. COMPILERS Principles, Techniques and Tools. AddisonWesley, 1985. 3. B. Bokowski and M. Dahm. Poor Mans Genericity for Java. In Clemens Cap, editor, Proceedings JIT98. Springer, 1998. 4. B. Bokowski and A. Spiegel. Barat A Front-End for Java. Technical report, Freie Universit t Berlin, 1998. a 5. Per Bothner. The gnu.bytecode package. http://www.cygnus.com/bothner/gnu.bytecode/, 1998. 6. Gerald Brose. Reection in Java, CORBA and JacORB. In Clemens Cap, editor, Proceedings JIT98. Springer, 1998. 7. Geoff Cohen, Jeff Chase, and David Kaminsky. Automatic Program Transformation with JOIE. In Proceedings USENIX Annual Technical Symposium, 1998. 8. Pascal Costanza. The ClassFilters package. Universit t Bonn, a http://www.cs.uni-bonn.de/costanza/ClassFilters/, 1998. 9. M. Dahm. Byte Code Engineering with the JavaClass API. Technical report, Freie Universit t Berlin, 1998. a 10. E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley, 1995. 11. A. Goldberg and D.Robson. Smalltalk The Language. Addison-Wesley, 1983. 12. Michael Golm and J rgen Klein der. metaXa and the Future of Reection. In Workshop on u o Reective Programming in C++ and Java, 1998. 13. J. Gosling, B. Joy, and G. Steele. The Java Language Specication. Addison-Wesley, 1996. 14. JavaSoft. Reection API. http://java.sun.com/products/jdk/1.1/docs/guide/reflection/, 1998. 15. Ralph Keller and Urs H lzle. Binary Component Adaptation. In Eric Jul, editor, Proceedings o ECOOP98. Springer, 1998. 16. Gregor Kiczales, John Lamping, Anurag Mendhekar, Chris Maeda, Cristina Lopes, JeanMarc Loingtier, and John Irwin. Aspect-Oriented Programming. Technical report, Xerox Palo Alto Research Center, 1997. 17. Han Bok Lee and Benjamin G. Zorn. BIT: A Tool for Instrumenting Java Bytecodes. In Proceedings USENIX Symposium on Internet Technologies and Systems, 1998. 18. Sheng Lian and Gilad Bracha. Dynamic Class Loading in the Java Virtual Machine. In Proceedings OOPSLA98, 1998. 19. Tim Lindholm and Frank Yellin. The Java Virtual Machine Specication. Addison-Wesley, 1997. 20. J. Meyer and T. Downing. Java Virtual Machine. OReilly, 1997. 21. A.C. Myers, J. A. Bank, and B. Liskov. Parameterized Types for Java. In Proceedings POPL97, Paris, France, 1997. 22. M. Thies and U. Kastens. Statische Analyse von Bibliotheken als Grundlage dynamischer Optimierung. In Clemens Cap, editor, Proceedings JIT98. Springer, 1998.

Das könnte Ihnen auch gefallen