tangentum technologies    
 

Deutsch

   
   
 
   

Features

XJBC is an assembly language for the Java VM based on XML. In contrast to other assemblers or bytecode-libraries it was specifically designed with automated processing in mind. Its main purpose is to give a logical representation of the contents of Java bytecode in the homogeneous representation-system that XML with its plethora of tools provides.

XJBC can be used for analysing, transforming, and generating Java bytecode. The disassembler transforms Java bytecode to XJBC (XML) and therefore acts as a starting point of tool chains for analysing and transforming bytecode. The assembler transforms XJBC (XML) to Java bytecode and therefore acts as an endpoint of tool chains for transforming or generating bytecode.

The language and its tools offer a number of features to eliminate the complexity of programming with Java bytecode and to enable its automated processing.

1. Fully based on XML
2. Constant pool completely hidden
3. Automatic calculation of "max-stack"
4. Numbered parameters and named locals
5. Labelled instructions
6. Unified families of instructions
7. Complete Unicode support
8. Simplified selection of elements by orthogonal attributes
9. Java bytecode easily extensible by user-defined attributes
10. Tools available as library and on command-line

1. Fully based on XML

Problem: Each assembly language for the Java VM defines its own syntax to represent the Java bytecode. The main technical problem of the other assembly languages for the Java VM is the way their syntax is defined. They are syntactically designed like assembly languages which were in use twenty or thirty years ago. The syntactical advancements which have been reached by XML in the meantime, are totally ignored by them. Parsers and generators for them, not to speak of transformers, are rather rarities. Therefore the future development of automated tool chains which use them is severely limited.

Solution: XJBC breaks with the traditional way assembly languages are designed, and is based on XML from the beginning. Therefore the plethora of tools, which were developed for XML, are available for XJBC out-of-the-box. This is a profound simplification of future developments which are based on XJBC.

2. Constant pool completely hidden

Problem: The class-files of Java contain an explicit pool to collect constants like strings, numbers, and names, which are used in the definition of the class and its members. This makes the class-files compact because these constants are stored only once in the pool, and referred to by indices from the outside. But handling with indices and following their indirection makes programming unnecessarily complex and error-prone.

Solution: XJBC represents the bytecode logically and eliminates indices and indirection completely. Each string, number, name, or other constant is directly represented as values of XML-attributes. This makes programming with XJBC straightforward and more reliable.

3. Automatic calculation of "max-stack"

Problem: The "Code" attributes represent the executable instructions of methods inside the Java bytecode. These attributes have fields called "max-stack". These fields give the maximum depth of the operand stack at any point during execution of the corresponding methods. These fields are checked by the bytecode verifier of the Java VM. The values of "max-stack" mustn't be too low, which would yield incorrect code, and shouldn't be too high, which would result in inefficient code. It is therefore crucial for the assembler to generate correct and optimal values.

Solution: The assembler of XJBC calculates correct and optimal values for "max-stack" automatically, enabling it to become part of an automated tool chain.

4. Numbered parameters and named locals

Problem: The local variables of a method as defined by the Java Virtual Machine Specification are identified by their index into an array of integers. Values of type long and double need two consecutive integers to be stored, invalidating the index of the second integer for storing and identifying other locals. Therefore no direct one-to-one mapping from the number of a "local" to its index exists. Beyond that the name "local variables" is somewhat misleading, because they not only contain the local variables as they are called in the context of Java source-code, but also the values of the parameters of the method. This makes the programming of Java bytecode overly complex and error-prone.

Solution: In XJBC the term local denotes the variables local to a method as usual, and the term param denotes the parameters of the method. The locals are identified by name, and the params are identified by index, because the parameters are given as a list with a "natural" order, and the locals themselves form a simple and unordered mapping from a name to a value. This simplifies programming significantly, because new local variables may be invented on-the-fly without disruption of program-semantics. The assembler of XJBC therefore calculates the value of the field "max-locals" of the "Code" attribute automatically.

5. Labelled instructions

Problem: Manually calculating instruction offsets for branch-instructions is tedious and error-prone. An assembler should define labels and prevent the programmer from making unnecessary mistakes.

Solution: XJBC defines two XML-attributes for instructions: one called "label" for each instruction, and one called "goto" for each branch-instruction. The "label"-attribute marks an instruction as a target of a branch-instruction, which has this mark in its "goto"-attribute. Because in Java bytecode branch-instructions may be of different size dependent on the size of the offsets, the assembler of XJBC automatically tries to minimize the sizes of the offsets and therefore to generate the smallest possible branch-instructions.

6. Unified families of instructions

Problem: The Java Virtual Machine Specification defines the structure of the class-files and the contained instructions. Some sub-sets of instructions in the Java bytecode form families, i.e. the have the same function, but differ in their concrete representation. As an example take the function to push an integer value onto the operand stack. It has the following concrete physical representation as opcodes: "iconst_m1", "iconst_0", "iconst_1", "iconst_2", "iconst_3", "iconst_4", "iconst_5", "bipush", "sipush", "ldc", and "ldc_w". To handle them explicitly is complex and error-prone.

Solution: XJBC represents instructions logically and therefore defines only one kind of element for each family of instruction, e.g. "iconst" for the example from above. The assembler of XJBC automatically generates in each case the most efficient instruction, which takes this burden from the programmer and/or transformation program.

7. Complete Unicode support

Problem: Java uses Unicode to encode characters. Each string in Java bytecode therefore contains Unicode characters, most of them valid, but some of them invalid.

Solution: XJBC defines a mapping of Unicode characters to representations that are valid and usable with XML, and vice versa. The representations used with XML are borrowed from their representations in Java source-code, and therefore familiar to every Java programmer. Even invalid Unicode-characters are correctly encoded and decoded, and transformation from Java bytecode to XJBC and vice versa, is lossless.

8. Simplified selection of elements by orthogonal attributes

Problem: Modification of Java bytecode requires the identification of the items to modify. There has to be a selection mechanism that defines the context in which the modification has to take place. This is especially necessary for automated transformations.

Solution: In XJBC the attributes always have the same meaning independent of the element they are part of. For example an object-type is always represented as the pair {"package='...'", "type='...'"} in the defining part of a class or the invocation-instruction of a method. This simplifies the statements of query-languages like XPath, to select specific elements inside the Java bytecode, i.e. its XJBC representation.

9. Java bytecode easily extensible by user-defined attributes

Problem: The Java bytecode mainly consists of attributes which define it's contents. The most important attributes are predefined by Sun Microsystems and understood by the Java-complying VMs, like the "Method"-attribute or the "Code"-attribute. The Java Virtual Machine Specification also allows users to define their own attributes as long as they don't conflict with the predefined attributes. This is useful for carrying some additional information with the bytecode, especially for tools that like to store metadata in it, say specific debugging information, the time and date of the construction of the class, additional design-data for round-trip engineering, and so on. Injection and extraction of these additional information has been painful or sometimes impossible to do in the past, because it was necessary to write special generators and parsers corresponding to the injected and extracted data. These generators and parsers were dependent on the APIs of the used libraries and therefore not portable.

Solution: With XJBC defining and using additional attributes in Java bytecode is as simple as defining and using additional elements and attributes in XML, because this is now the same. If you add new elements to an XJBC (XML) representation, the assembler of XJBC automatically encodes the additional information to correct Java bytecode using the constant-pool, and the disassembler automatically decodes it while restoring the bytecode back to its XML-representation. The process to convert XJBC to Java bytecode and back to XJBC is nearly without loss of information. There are only very few places inside the XML-representation where additional elements and attributes are stripped off. These places are thoroughly documented.

10. Tools available as library and on command-line

Problem: Tools need to be available as an API to be embeddable into applications, and on the command-line to be easily callable in scripting scenarios.

Solution: The tools of XJBC are realized as a library and are also callable on the command-line. The assembler is realized as an implementation of "XMLReader" and the disassembler as an implementation of "ContentHandler", and both conform to the SAX2-standard. For the command-line two classes "org.xjbc.asm" and "org.xjbc.disasm" are defined which take their input from "stdin" and put their output to "stdout", so both are usable on the command-line as simple filters.