Java features such as dynamic class loading and reflection make it a dynamic language. However, in many cases, reflection is not sufficient, and developers need to generate bytecode from non-Java source code, such as scripting languages like Groovy (JSR-241) or BeanShell (JSR-274), or from metadata such as an OR-mapping configuration. When working with existing classes, and especially when original Java sources are not available, some tools may need to do a static analysis of the interdependencies or even method behavior in order to produce test coverage or metrics, or to detect bugs and anti-patterns. New features added to into Java 5, such as annotations and generics, affected bytecode structure and require special attention from bytecode manipulation tools to maintain good performance. This article will give an overview of one of the smallest and fastest bytecode manipulation frameworks available for Java.
|
Related Reading
Hardcore Java |
The ASM bytecode manipulation framework is written in Java and uses a visitor-based approach to generate bytecode and drive transformations of existing classes. It allows developers to avoid dealing directly with a class constant pool and offsets within method bytecode, thus hiding bytecode complexity from the developer and providing better performance, compared to other tools such as BCEL, SERP, or Javassist.
ASM is divided into several packages that allow flexible bundling. The packaging arrangement is shown in Figure 1.

Figure 1. Arrangement of ASM packages
The next few sections will give an introduction to the Core
package of the ASM framework. To get a better understanding of
the organization of this package, you have to have some basic
understanding of the bytecode structures that are defined in the
JVM
specification. Here is a high-level diagram of the class file
format ([*] marks repeatable structures).
[1]-------------------------------------------+
| Header and Constant Stack |
+--------------------------------------------+
| [*] Class Attributes |
[2]------------+------------------------------+
| [*] Fields | Field Name, Descriptor, etc |
| +------------------------------+
| | [*] Field Attributes |
[3]------------+------------------------------+
| [*] Methods | Method Name, Descriptor, etc |
| +------------------------------|
| | Method max stack and locals |
| |------------------------------|
| | [*] Method Code table |
| |------------------------------|
| | [*] Method Exception table |
| |------------------------------|
| | [*] Method Code Attributes |
| +------------------------------|
| | [*] Method Attributes |
+-------------+------------------------------+
|
Here are a few things to notice:
As you can see, bytecode tweaking isn't easy. However, the ASM framework reduces the complexity of the underlying structures and provides a simplified API that still allows for access to all bytecode information and enables complex transformations.
The Core package uses a push approach (similar to the "Visitor" design
pattern, which is also used in the SAX API for XML processing) to walk
trough complex bytecode structures. ASM defines several interfaces,
such as ClassVisitor (section [1] in the class file format diagram above),
FieldVisitor (section [2]), MethodVisitor
(section [3]), and AnnotationVisitor.
AnnotationVisitor is a special interface that allows
you to express hierarchical annotation structures. The next few
paragraphs will show how these interfaces interact with each other
and how they can be used together to implement bytecode
transformations and/or capture information from the bytecode.
The Core package can be logically divided into two major parts:
ClassReader or a
custom class that can fire the proper sequence of calls to the methods
of the above visitor classes.ClassWriter,
FieldWriter, MethodWriter, and
AnnotationWriter), adapters (ClassAdapter
and MethodAdapter), or any other classes implementing
the above visitor interfaces.Figure 2 shows the sequence diagram for the common producer-consumer interaction.

Figure 2. Sequence diagram for producer-consumer
interaction
In this interaction, a client application creates
ClassReader and calls the accept()
method, passing a concrete ClassVisitor instance as a
parameter. Then ClassReader parses the class and fires
"visit" events to ClassVisitor for each bytecode
fragment. For repeated contexts, such as fields, methods, or
annotations, a ClassVisitor may create child visitors
derived from the corresponding interface
(FieldVisitor, MethodVisitor, or
AnnotationVisitor) and return them to the producer. When
a producer receive a null value for FieldVisitor or
MethodVisitor, it skips that fragment of the class
(e.g., a ClassReader wouldn't even parse the
corresponding bytecode section in such a case, which leads to a
sort of "lazy loading" feature driven by the visitors). Otherwise,
the corresponding subcontext events are delegated to the child
visitor instance. At the end of each subcontext, the producer calls
the visitEnd() method and then moves on to the next
section (e.g., the next field, method, etc.).
|
Bytecode consumers can be linked together in a "Chain
of responsibility" pattern by either manually delegating events
to the next visitor in the chain, or with using visitors derived
from ClassAdapter and/or MethodAdapter
that delegate all visit methods to their underlying visitors. Those
delegators act as bytecode consumers from one side and as bytecode
producers from the other. They can decide to modify natural
delegation in order to implement specific bytecode
transformation:
The chain can be ended by a ClassWriter visitor,
which will produce the resulting bytecode. For example:
ClassWriter cw = new ClassWriter(computeMax);
ClassVisitor cc = new CheckClassAdapter(cw);
ClassVisitor tv =
new TraceClassVisitor(cc, new PrintWriter(System.out));
ClassVisitor cv = new TransformingClassAdapter(tv);
ClassReader cr = new ClassReader(bytecode);
cr.accept(cv, skipDebug);
byte[] newBytecode = cw.toByteArray();
In the above code, the TransformingClassAdapter
implements custom class transformations and sends the results to the
TraceClassVisitor passed to its constructor.
TraceClassVisitor prints the transformed class and
delegates the same events to CheckClassAdapter, which
does simple bytecode verification and then passes the event to the
ClassWriter.
Most of the visit methods receive simple parameters such as
int, boolean, and String. In
all visit methods where String parameters refer to
constants within the bytecode, ASM uses the same representation as
used by the JVM. For instance, all class names (e.g. super class,
interfaces, exceptions, owner classes for the field, and methods
referenced from method code) should be specified in the
Internal Form. Field and method descriptors (usually the
desc parameter) also should be in
JVM representation. The same approach is taken for
signature parameters used to represent generics
information, so they should follow the grammar defined in Section 4.4.4
of the
Revised Class File Format (PDF). This approach helps avoid
unnecessary calculations when no transformation is required. To
help construct and parse such descriptors, there is a
Type class that provides several static methods:
String getMethodDescriptor(Type returnType, Type[]
argumentTypes)String getInternalName(Class c)String getDescriptor(Class c)String getMethodDescriptor(Method m)Type getType(String typeDescriptor)Type getType(Class c)Type getReturnType(String methodDescriptor)Type getReturnType(Method m)Type[] getArgumentTypes(String
methodDescriptor)Type[] getArgumentTypes(Method m)Note that these descriptors are using an "erasured" representation, which does not contain generics
information. Generics info is actually stored as a separate
bytecode attribute, but ASM takes care of this attribute and passes
the generic signature string in a signature parameter
of the appropriate visit methods. The value of the
signature string also uses the JVM representation (see
Section 4.4.4 in the "Class File Format" (PDF) as revised for Java 5), which uniquely maps
from the generic declarations in Java code, but presents an
additional challenge for tools that need to retrieve details from
generic types. To handle signatures, ASM provides
SignatureVisitor, SignatureReader, and
SignatureWriter classes modelled in a way similar to
other ASM visitors, as illustrated in Figure 3.

Figure 3. Sequence diagram for Signature classes
The Util package contains TraceSignatureVisitor,
which implements SignatureVisitor and allows you to
convert a signature value into a Java declaration with
generic types. The following example converts a method signature
into a Java method declaration.
TraceSignatureVisitor v =
new TraceSignatureVisitor(access);
SignatureReader r = new SignatureReader(sign);
r.accept(v);
String genericDecl = v.getDeclaration();
String genericReturn = v.getReturnType();
String genericExceptions = v.getExceptions();
String methodDecl = genericReturn + " " +
methodName + genericDecl;
if(genericExceptions!=null) {
methodDecl += " throws " + genericExceptions;
}
Up to this point, we have talked about the general design of the ASM framework and manipulating class structure. However, the most interesting part is how ASM handles method code.
|
In ASM, a method declaration is represented by the
ClassVisitor.visitMethod(), and the rest of the method
bytecode artifacts (Section [3] on class file
format diagram) are represented by number of the visit methods
in MethodVisitor. These methods are called in the
following order, where "*" marks repeated methods and "?" marks
methods that can be called once at most. In addition, the
visit...Insn and visitLabel
methods must be called in the sequential order of the bytecode
instructions of the visited code, and the
visitTryCatchBlock, visitLocalVariable,
and visitLineNumber methods must be called after the
labels passed as arguments have been visited.
| ? |
visitAnnotationDefault |
Visits the default value for annotation interface method |
| * |
visitAnnotation |
Visits a method annotation |
| * |
visitParameterAnnotation |
Visits a method parameter annotation |
| * |
visitAttribute |
Visits a non-standard method attribute |
| ? |
visitCode |
Starts the visit of the method's code for non-abstract and non-native methods |
| * | visitInsn |
Visits a zero operand instruction: NOP, ACONST_NULL, ICONST_M1,
ICONST_0, ICONST_1, ICONST_2, ICONST_3, ICONST_4, ICONST_5,
LCONST_0, LCONST_1, FCONST_0, FCONST_1, FCONST_2, DCONST_0,
DCONST_1, IALOAD, LALOAD, FALOAD, DALOAD, AALOAD, BALOAD, CALOAD,
SALOAD, IASTORE, LASTORE, FASTORE, DASTORE, AASTORE, BASTORE,
CASTORE, SASTORE, POP, POP2, DUP, DUP_X1, DUP_X2, DUP2, DUP2_X1,
DUP2_X2, SWAP, IADD, LADD, FADD, DADD, ISUB, LSUB, FSUB, DSUB,
IMUL, LMUL, FMUL, DMUL, IDIV, LDIV, FDIV, DDIV, IREM, LREM, FREM,
DREM, INEG, LNEG, FNEG, DNEG, ISHL, LSHL, ISHR, LSHR, IUSHR, LUSHR,
IAND, LAND, IOR, LOR, IXOR, LXOR, I2L, I2F, I2D, L2I, L2F, L2D,
F2I, F2L, F2D, D2I, D2L, D2F, I2B, I2C, I2S, LCMP, FCMPL, FCMPG,
DCMPL, DCMPG, IRETURN, LRETURN, FRETURN, DRETURN, ARETURN, RETURN,
ARRAYLENGTH, ATHROW, MONITORENTER, or MONITOREXIT. |
visitFieldInsn |
Visits a field instruction: GETSTATIC, PUTSTATIC, GETFIELD or
PUTFIELD. |
|
visitIntInsn |
Visits an instruction with a single int operand: BIPUSH, SIPUSH,
or NEWARRAY. |
|
visitJumpInsn |
Visits a jump instruction: IFEQ, IFNE, IFLT, IFGE, IFGT, IFLE,
IF_ICMPEQ, IF_ICMPNE, IF_ICMPLT, IF_ICMPGE, IF_ICMPGT, IF_ICMPLE,
IF_ACMPEQ, IF_ACMPNE, GOTO, JSR, IFNULL, or IFNONNULL. |
|
visitTypeInsn |
Visits a type instruction: NEW, ANEWARRAY, CHECKCAST, or
INSTANCEOF. |
|
visitVarInsn |
Visits a local variable instruction: ILOAD, LLOAD, FLOAD,
DLOAD, ALOAD, ISTORE, LSTORE, FSTORE, DSTORE, ASTORE, or RET. |
|
visitMethodInsn |
Visits a method instruction: INVOKEVIRTUAL, INVOKESPECIAL,
INVOKESTATIC, or INVOKEINTERFACE. |
|
visitIincInsn |
Visits an IINC instruction. |
|
visitLdcInsn |
Visits an LDC instruction. |
|
visitMultiANewArrayInsn |
Visits a MULTIANEWARRAY instruction. |
|
visitLookupSwitchInsn |
Visits a LOOKUPSWITCH instruction. |
|
visitTableSwitchInsn |
Visits a TABLESWITCH instruction. |
|
visitLabel |
Visits a label. | |
visitLocalVariable |
Visits a local variable declaration. | |
visitLineNumber |
Visits a line number declaration. | |
visitTryCatchBlock |
Visits a try-catch block. | |
visitMaxs |
Visits the maximum stack size and the maximum number of local variables of the method. | |
visitEnd |
Visits the end of the method. | |
Note that the visitEnd method must
always be called at the end of method processing.
ClassReader does that for you, but it should be taken
care of in a custom bytecode producer; e.g., when a class is
generated from scratch or when new methods are introduced.
Also note that if a method actually has some bytecode (i.e.,
if it is not abstract and not a native method), then
visitCode must be called before the first
visit...Insn call, and the visitMaxs
method must be called after last visit...Insn
call.
Each of the visitIincInsn,
visitLdcInsn, visitMultiANewArrayInsn,
visitLookupSwitchInsn, and
visitTableSwitchInsn methods uniquely represent one
bytecode instruction. The rest of the visit...Insn
methods--namely, visitInsn,
visitFieldInsn, visitIntInsn,
visitJumpInsn, visitTypeInsn,
visitVarInsn, and visitMethodInsn--represent more then one bytecode instruction, with their opcodes
passed in a first method parameter. All constants for those opcodes
are defined in the Opcodes interface. This approach is
very performant for bytecode parsing and formatting. Unfortunately,
this could be a challenge to the developer who is trying to
generate code, because ClassWriter does not verify
these constraints. However, there is a
CheckClassAdapter that could be used during
development to test generated code.
Another challenge with any kind of bytecode generation or
transformation is that offsets within method code can change and
should be adjusted when additional instructions are inserted or
removed from the method code. This is applicable to parameters of
all jump opcodes (if, goto,
jsr, and switch), as well as to try-catch
blocks, line number and local variable declarations, and to some of
the special attributes (e.g., StackMap, used by CLDC).
However, ASM hides this complexity from the developer. In order to
specify positions in the method bytecode and not have to use
absolute offsets, a unique instance of the Label class
should be passed to the visitLabel method. Other
MethodVisitor methods such as
visitJumpInsn, visitLookupSwitchInsn,
visitTableSwitchInsn, visitTryCatchBlock,
visitLocalVariable, and visitLineNumber
can use these Label instances even before the
visitLabel call, as long as the instance will be called later in
a method.
The above may sound complicated, and at first glance requires
deep knowledge of the bytecode instructions. However, using
ASMifierClassVisitor on compiled classes allows you to
see how any given bytecode could be generated with ASM. Moreover,
applying ASMifier on two compiled classes (an original one and one
after applying the required transformation) and then running
diff on the output gives a good hint as to what ASM
calls should be used in the transformer. This process is explained in
more detail in several articles (see the Resources section below). There is even a plugin for the
Eclipse IDE, shown in Figure 4, that provides a great support
for generating ASM code and comparing ASMifier output right from
Java sources, and also includes a contextual bytecode reference.

Figure 4. Eclipse ASM plugin (Click on the picture to see a
full-size image)
|
There are already a few articles that explain how to generate bytecode with ASM (see the Resources section for some links). For a change, let's see how ASM can be used to analyze existing classes. One interesting application is to capture information about external classes or packages used by any given module or .jar file. For simplicity, this example will only capture outgoing dependencies and won't keep track of the dependency types (e.g., superclass, method parameters, local variable types, etc.).
Notice that for analysis purposes, we don't need to create new instances of child visitors for annotations, fields, and methods. All of these visitors, including class and signature visitor, could be implemented in a single class:
public class DependencyVisitor implements
AnnotationVisitor, SignatureVisitor,
ClassVisitor, FieldVisitor, MethodVisitor {
...
For this example, we will track dependencies between packages, so individual classes should be aggregated by the package name:
private String getGroupKey(String name) {
int n = name.lastIndexOf('/');
if(n>-1) name = name.substring(0, n);
packages.add(name);
return name;
}
In order to collect dependencies, visitor interfaces such as
ClassVisitor, AnnotationVisitor,
FieldVisitor, and MethodVisitor should
selectively aggregate parameters of their methods. There are
several common cases:
First of all, there are class names in
internal form (super class, interfaces, exceptions, field and
method owners); e.g., java/lang/String:
private void addName(String name) {
if(name==null) return;
String p = getGroupKey(name);
if(current.containsKey(p)) {
current.put(p, current.get(p)+1);
} else {
current.put(p, 1);
}
}
In this case, current is the current group of
dependencies (e.g., package).
Another case is type descriptors (annotations, enum and field
types, parameters of the newarray instruction, etc.); e.g.,
Ljava/lang/String;, J, and [[[I.
These can be parsed with Type.getType( desc) to obtain
the class name in internal form:
private void addDesc(String desc) {
addType(Type.getType(desc));
}
private void addType(Type t) {
switch(t.getSort()) {
case Type.ARRAY:
addType(t.getElementType());
break;
case Type.OBJECT:
addName(t.getClassName().replace('.','/'));
break;
}
}
Method descriptors used in method declarations and in invoke
instructions describe parameter types and return a type; e.g.,
([java/lang/String;II)V. The helper methods
Type.getReturnType(methodDescriptor) and
Type.getArgumentTypes(methodDescriptor) can parse such
descriptors and extract parameter and return types.
private void addMethodDesc(String desc) {
addType(Type.getReturnType(desc));
Type[] types = Type.getArgumentTypes(desc);
for(int i = 0; i < types.length; i++) {
addType(types[ i]);
}
}
The special case is the signature parameter used in
many "visit" methods to specify Java 5 generics info. If it is present
(i.e., non-null), this parameter overrides the descriptor parameter
and contains an encoded form of the generics information.
SignatureReader class could be used to parse this
value. So we can implement a SignatureVisitor, which
will be called for each signature artifact.
private void addSignature(String sign) {
if(sign!=null) {
new SignatureReader(sign).accept(this);
}
}
private void addTypeSignature(String sign) {
if(sign!=null) {
new SignatureReader(sign).acceptType(this);
}
}
|
Methods implementing the ClassVisitor interface,
such as visit(), visitField(),
visitMethod(), and visitAnnotation(), can
collect information about dependencies on superclasses and
interfaces, types used by fields, method parameters, return values,
and exceptions, as well as types of the annotations. For
example:
public void visit(int version, int access,
String name, String signature,
String superName, String[] interfaces) {
String p = getGroupKey(name);
current = groups.get(p);
if(current==null) {
current = new HashMap<String,Integer>();
groups.put(p, current);
}
if(signature==null) {
addName(superName);
addNames(interfaces);
} else {
addSignature(signature);
}
}
public FieldVisitor visitField(int access,
String name, String desc,
String signature, Object value) {
if(signature==null) {
addDesc(desc);
} else {
addTypeSignature(signature);
}
if(value instanceof Type) {
addType((Type) value);
}
return this;
}
public MethodVisitor visitMethod(int access,
String name, String desc,
String signature, String[] exceptions) {
if(signature==null) {
addMethodDesc(desc);
} else {
addSignature(signature);
}
addNames(exceptions);
return this;
}
public AnnotationVisitor visitAnnotation(
String desc, boolean visible) {
addDesc(desc);
return this;
}
Methods implementing the MethodVisitor interface can
collect dependencies on types of the parameter annotations and
types used in bytecode instructions that can use object
references:
public AnnotationVisitor
visitParameterAnnotation(int parameter,
String desc, boolean visible) {
addDesc(desc);
return this;
}
/**
* Visits a type instruction
* NEW, ANEWARRAY, CHECKCAST or INSTANCEOF.
*/
public void visitTypeInsn(int opcode,
String desc) {
if(desc.charAt(0)=='[') {
addDesc(desc);
} else {
addName(desc);
}
}
/**
* Visits a field instruction
* GETSTATIC, PUTSTATIC, GETFIELD or PUTFIELD.
*/
public void visitFieldInsn(int opcode,
String owner, String name, String desc) {
addName(owner);
addDesc(desc);
}
/**
* Visits a method instruction INVOKEVIRTUAL,
* INVOKESPECIAL, INVOKESTATIC or
* INVOKEINTERFACE.
*/
public void visitMethodInsn(int opcode,
String owner, String name, String desc) {
addName(owner);
addMethodDesc(desc);
}
/**
* Visits a LDC instruction.
*/
public void visitLdcInsn(Object cst) {
if(cst instanceof Type) {
addType((Type) cst);
}
}
/**
* Visits a MULTIANEWARRAY instruction.
*/
public void visitMultiANewArrayInsn(
String desc, int dims) {
addDesc(desc);
}
/**
* Visits a try catch block.
*/
public void visitTryCatchBlock(Label start,
Label end, Label handler, String type) {
addName(type);
}
|
Now we can use DependencyVisitor to collect
dependencies from the entire .jar file. For example:
DependencyVisitor v = new DependencyVisitor();
ZipFile f = new ZipFile(jarName);
Enumeration<? extends ZipEntry> en = f.entries();
while(en.hasMoreElements()) {
ZipEntry e = en.nextElement();
String name = e.getName();
if(name.endsWith(".class")) {
ClassReader cr =
new ClassReader(f.getInputStream(e));
cr.accept(v, false);
}
}
The collected information can be represented in many different ways. One can build dependency trees and calculate some metrics, or create some visualizations. For example, Figure 5 shows how ant.1.6.5.jar looks in a visualization I built on top of the collected information using some simple Java2D code. The following diagram shows packages from the input .jar on a horizontal axis and external dependencies on a vertical axis. The darker a box's color is, the more times the package is referenced.

Figure 5. Dependencies in ant.1.6.5.jar, as discovered with
ASM
The complete code of this tool will be included into the next ASM release. It can be also obtained from ASM CVS.
You can skip this section if you haven't used ASM 1.x.
The major structural change in ASM 2.0 is that all J2SE 5.0
features are built into the ASM visitor/filter event flow. So the
new API allows you to deal with generics and annotations in a much
more lightweight and semantically natural way. Instead of
explicitly creating annotation attribute instances, we have
generics and annotation data within the event flow. For example, in
ASM 1.x, the ClassVisitor interface used the following
method:
CodeVisitor visitMethod(int access, String name,
String desc, String[] exceptions,
Attribute attrs);
This has been split into several methods in ASM 2.0:
MethodVisitor visitMethod(int access,
String name, String desc, String signature,
String[] exceptions)
AnnotationVisitor visitAnnotation(String desc,
boolean visible)
void visitAttribute(Attribute attr)
In the 1.x API, in order to define generics info, you'd have to
create specific instances of the SignatureAttribute,
and to define annotations, you'd need instances of the
RuntimeInvisibleAnnotations,
RuntimeInvisibleParameterAnnotations,
RuntimeVisibleAnnotations,
RuntimeVisibleParameterAnnotations, and
AnnotationDefault. Then you'd put these instances into
the attrs parameter of the appropriate visit
method.
In ASM 2.0, a new signature parameter has been added
to represent generics info. The new AnnotationVisitor
interface is used to handle all annotations. There is no need
to create an attrs collection, and annotation data is
more strictly typed. However, when migrating existing code,
especially when "adapter" classes have been used; it is necessary
to be careful and make sure that all methods overwritten from the
adapter are updated to new signatures, because the compiler will
raise no warnings.
|
There are several other changes introduced in ASM 2.0.
FieldVisitor and
AnnotationVisitor.CodeVisitor into MethodVisitor.visitCode() method added to the
MethodVisitor to easily detect first instruction.Constants interface renamed into
Opcodes.attrs package are
incorporated into ASM's event model.TreeClassAdapter and TreeCodeAdapter
are incorporated into the ClassNode and
MethodNode.LabelNode class to make elements of
instructions collection common type of
AbstractInsnNode.In general, it would be a good idea to run tool like JDiff and review the differences between the ASM 1.x and 2.0 APIs.
ASM 2.0 hides many bytecode complexities from the developer and allows one to efficiently work with Java features on a bytecode level. The framework allows you not only to transform and generate bytecode, but also to pull out significant details about existing classes. The API is being constantly improved--version 2.0 incorporates the generics and annotations introduced in J2SE 5.0. Since then, support for the new features introduced in Mustang (see "Java SE 6 Snapshot Releases") have been added to the ASM framework.
Eugene Kuleshov is an independent consultant with over 15 years of experience in software design and development.
Return to ONJava.com.
Copyright © 2009 O'Reilly Media, Inc.