Document Info: This document is supposed to extensively document the DDLVM and its 'bytecode', enough to enable someone to write an interpreter. It does not attempt to document the DDL language or the DDL base classes Document History: Revision 0.1 Created Jules Bean Released 09/04/97 Initial attempt to define DDLVM 0.2 November 1997, changes due to Java rewrite and some rethinking... The DDL Virtual Machine ======================= The DDL Virtual Machine is extremely simple - it has very few opcodes and registers, and it is streamlined to deal almost exclusively with objects. This document does not attempt to define the syntax of DDL.. it attempts to define the DDLVM. Where appropriate, however if refers to or uses examples of DDL syntax as an illustration. 1. Basics 1.1 Registers There are only two registers in the DDLVM. There is, for example no PC (as there are no jump instructions). There are no general purpose registers - all manipulations are done with objects. SP The stack pointer This A small concession to efficiency - the register 'This' refers to the current object, in an object context. If referred to in a class context, it should cause a run-time error. 1.2 The Stack The stack is a stack of object references, and most operations just use the stack. How the references are implemented is implementation defined - pointers are probably inappropriate from a security point of view - indexes into a global object table might make more sense. The stack is considered endless, subject to implementation constraints. Most variable references are stack-relative - (SP-4). 2 Objects The DDLVM works almost entirely with objects, which it deals with via references. An object reference is permitted to take the value null (although to dereference a null object is a runtime error, of course). Object references should be comparable - to see if they refer to the same object. 2.1 Properties Objects are formed of properties and methods. Properties are like instance variables in C++ - but note that in DDL, a property can only be an object reference. Unlike C++, when a subclass redefines a superclass's variable it does not create a new property, but simply refers to the previous one. This is normally to give a new default initialiser. (q.v) Class properties also exist - these are the only kinds of global variables. A property is 'fetched' by the ObjectProp instruction, which leaves the property on the stack. Similarly, a class property is fetched by ClassProp. 2.2 Methods Object methods are called using the ObjectMeth instruction. Class methods (static methods) are called using ClassMeth. When a method is called, its parameters are on the stack - pushed on it left-to-right, so the 'first' or 'leftmost' parameter is deepest. When it exits, it should decrement the stack pointer past all of these (and any automatic variables it has allocated in the meanwhile) and push the return value, if any, on the stack. It is a run-time error to ever attempt to decrement the stack past its initial level on any particular function. A conscientious (paranoid)? DDLVM should in fact check not only this, but that the SP is either in exactly the same place as at entry (in the case of a void function) or one higher (value returning function) on return. When an object method is called, the register This is set to point at the object it is called on. Its value in the context of the calling function is saved and restored. 2.3 Constructors An object is created by a call to one of its constructors. In fact, it uses a special opcode Create - because a Constructor is not a method (in the precise sense of the word). Constructors are a bugger. The reason constructors are a bugger is that a constructor must call the constructor of its immediate base class before it can run - 'this', and hence all instance variables have no meaning before that point. So, constructors are stored in two parts - which I call the pre-constructor and the post-constructor. The code in the pre-constructor section is executed in an environment not unlike a class method - it has no 'This' register, and any such accesses cause run-time exceptions. It can create local variables in the standard way (on the stack) and these will still be accessible in the post-constructor. A unique privilege of pre-constructors is their ability to initialise const properties. A constructor must declare which superclass constructor it is trying to call, by specifying the signature. At present, a constructor cannot conditionally decide which superclass constructor to call. A pre-constructor must then leave on the stack the parameters for the chosen superclass constructor, which the interpreter will automatically call. The superclass constructor is not 'called' in a way which can be imitated with any opcode - it is a special sort of call, only used in constructor chaining (which is what this kind of system is called). The post-constructor is then executed. The stack will look like it did at the end of the pre-constructor, except without the arguments for the superclass constructor. And, the This register is now set to point to a newly allocated object of the class of this Constructor. The reason for this complex is system is that objects must be able to call their superclass contstructors with calculated values. Let me give an example of the order of events in a constructor chain. Suppose we have a class hierarchy Object--Shape--Circle, and Create is called to make a Circle object. The order of events is as follows: o pre-constructors, in the order Circle--Shape--Object o An object of type Object is actually allocated, and 'This' points to it o Object's post-constructor o 'This' is promoted to type Shape, (extra instance vars allocated) o Shape's post-constructor o 'This' is promoted to type Circle o Circle's post-constructor Or at least that is (approximately) how C++ does it - in order to ensure that uninitialised variables are not accessed. DDL will do it this way until someone tells me it shouldn't... 2.3.1 Construction of instance properties Instance properties (instance variables) can be initialised in two distinct ways. A property can have a default value, defined in the class definition. This enables a virtual machine implementation to save memory by not allocating space for a particular instance's copy of that property until it is modified. In this case, one object is created at DDLVM start-up time, and referred to when any object does not have its own copy. This forms a sort of half-way house between a class static variable and an instance variable. Failing that, a property is created as a null reference. Note that this is potentially confusing, as programmers may expect an integer to be constructed as a zero, but it is in fact a null reference. This is a safety-net: Any property which has a meaningful default value should have it set as above - and if it does not, it should be initialised by a Create instruction (very likely from inside the constructor of the object which owns it). If neither of these have happened, it will cause a null reference exception when used. 2.4 Destructors DDL has destructors like C++. They are chained like constructors. They are very rarely needed, since reference counting handles destruction of properties. They cannot be called explicitly (except by knowingly zeroing the last remaining reference to an object). There may be a case for explicit calling of a destructor to null all references to an object - but I am unconvinced, since this would leave traiing null references all over the place. 2.5 Reference Counting Objects keep track of how many references to them exist. When this count reaches zero, they self-destruct. The reference count of an object is increased whenever a new reference to it is created. Objects are referred to from precisely three places: On the stack (as local variables and formal arguments to methods), inside other objects, or in class static storage. The reference count of an object is decreased whenever a reference goes out of scope (stack pointer is decreased past it, or holding object removes reference). The statement "myObject := null" will do this. When the reference count of an object reaches zero, it destroys itself. It does this by calling its destructor, unreferencing all of its own (not its superclass's) instance properties (which may well trigger further destructions), demoting itself to its superclass, and repeating. Circular referencing can lead to persistent objects (trivia : this is why Java uses garbage collection not reference counting) - care must be taken to avoid such constructs if they will lead to memory leaks. 3 Classes Classes exist at runtime in a DDLVM. They are something like objects, in the sense that they have properties (class properties) and methods (class methods). They are however, *not* objects (at least, not in from the viewpoint of DDL code). For example, there is always precisely one 'instance' of any class. There is a type (a class, in fact) Class, which represents a class at runtime. However, it should not be considered to be that class.. just something akin to a reference (confusing choice of word) or name for it. 3.1 Class Methods Class methods are very simple. They are in some sense just global functions which happen to be 'packaged' in a certain class - which gives them certain access privileges. 3.2 Class Properties Class properties can also be considerd global properties, in some sense. They should normally not be externally visible, though. If a class has an instance property with a default value, this becomes a class property, but not one normally written to (perhaps never?) outside the class constructor. 3.3 Class initialisation In order for classes to be able to give values to default properties (see discussion under 2.3.1) and also initial values to class properties, classes have a constructor. This may be called at DDLVM start-up time, or alternatively when the class is first referenced in any way (this allows for efficient DDLVMs, and also perhaps remote on-demand class loading a la Java). It will only be called once. It is a very simple method - it takes no parameters, has no return value. Unlike an object constructor, it is only in one part. This means that as soon as it has started, the class is deemed created, and operations can take place on it. It is the responsibility of this method, then, to ensure that it doesn't call any methods which depend on a variable it hasn't yet initialised. (If the DDLVM supports threads, or any other kind of distributed (over a network?) access to the same classes, this only applies to the thread actually in the class constructor. Any other threads attempting to access the class block until the constructor is over). In particular, any const class properties should be initialised as soon as possible. 3.4 Example The following example summarises the main structures in a class definition: class Kobold { String name; // property, uninitialised Integer HP=6; // property with default value Integer Mana; // property initialised in ctor below class Integer numKobolds=0; // class property - C++ would use // static where I have class class Integer getNumKobolds(); // class method Kobold(); // constructor Kobold(String n); // alternative constructor with args } Although no explicit class constructor is given (it would be defined 'class Kobold()'), the compiler will create one to initialise numKobolds to 0 and HP to 6. 4 File Structure ** This section is now out of date... DDO code may be replaced by a Java-serialised-object format ** Conventionally a .DDL file contains DDL source code (which this document does not attempt to define). A .DDO file, on the other hand, contains DLL 'bytecode'. It is better in fact to refer to this as object code, since it has more structure to it than the term 'bytecode' suggests. A DDLVM object file, then, has the following hierarchical structure: 4.1 File This, obviously represents the whole file. A File is simply a sequence of Classes. 4.2 Class Defines a single DDL class. It has the following structure: String Class Name String Parent Class Name List of Strings Interfaces implemented List of Properties The class's properties (i.e. static properties) List of Properties Instance properties ClassConstructor The class constructor List of Constructors Constructors List of Methods Class Methods List of Methods Instance Methods 4.3 Property Defines a property (variable) - either shared (class) or instance-specific (object). It has the following structure: String The type of the property String The name of the property Note that a 'type' is more than just a class - it might include modifiers like 'const'. *** Does not store access modifiers - these are not currently implemented. I didn't know how to store them, either - as a string as well? *** 4.4 Method Defines a method - either a class method (which has global scope) or an object method (which has an implicit first parameter 'this'). Methods are identified by their signature, e.g: 'Integer getIntegerHP()' 'Float calcProb(const Integer,const Integer)' It has the following structure: String Signature List of Instructions Method code 4.5 Constructor Defines an object constructor String Signature String Superclass constructor to use List of Instructions pre-constructor List of Instructions post-constructor 4.6 ClassConstructor ClassConstructors don't even have signatures... List of Instruction code 4.7 List 32-bit int number of items ? Items... 4.8 String null-terminated 4.9 Instruction Defined in more detail below, an instruction in general is one byte (the opcode) sufficient to distinguish the instructions, followed by instruction-dependent data: byte opcode ? data 5 'Fundamental' Objects DDL doesn't have 'fundamental' types the way C++ or Java do... I wanted everything to be an object. However, some types are required by the language (a bit like the java.lang package) and some types cannot be code in DDL - they need some kind of hardwired behaviour. 5.1 Boolean The Boolean class is required by the conditional instructions. It is implemented in such a way that there can only ever be two objects of the class - one is the 'TRUE' constant, and one the 'FALSE'. I.e. it has no accesible constructor. All you can ever do is make references to the constants. As an aside, it would be possible to implement the Boolean class entirely in DDL, using private subclasses. 5.2 Integer The integer object is internally handled just like the Boolean class. XXX***** 5.3 The Iterable interface 5.4 The Iterator object 5 Instructions Here are described all the valid Instructions, with a short description, an 'assembly code' syntax, and a file structure. DDL VM code can be neatly implemented by setting up a class library which describes all the instructions, and also the various constructs in a DDOFile. They can be given methods to write to and from a file, both in object form and in 'assembler' form, and to execute themselves - and then this hierarchy can be shared by the compiler application and the interpreter application. Note that there is a one-to-many relationship between instructions and opcodes - any one instruction may have several forms which are distinguished by having different opcodes. 5.1 Simple Instructions 5.1.1 Copy Create a new reference (on the top of the stack) to an object already on the stack. Used to get parameters on the stack for a function call: Copy (SP-3) Structure: byte opcode 32bit SP offset of reference to copy 5.1.2 Kill Decrement SP and unreference the object there (if not null) Kill Structure: byte opcode 5.1.3 Swap Swap two references on the stack. Used normally to get a return value in place. Swap (SP-2),(SP-5) Structure: byte opcode 32bit SP offset of reference 1 32bit SP offset of reference 2 5.2 Object Instructions Instructions which operate on objects. 5.2.1 Create Creates an object, using the constructor supplied. Constructor arguments should in general be on the stack, and the object will be put on the stack on successful construction. There are also special versions of create to handle native datatypes. Create "Kobold(String,int)" Structure: byte opcode String signature CreateInt 5 Structure: byte opcode 32bit integer value CreateStr "Hello" Structure: byte opcode String string value CreateFlt 5.0 Structure: byte opcode IEEE ??-bit floating point value 5.2.2 ClassMeth Call a class static method. It is not currently possible to call a method on a class name stored in a variable. ClassMeth "Kobold","Integer getNumKobolds()" Structure: byte opcode String class name String method signature 5.2.3 ObjectMeth Call an object method. All object methods are virtual. There are two forms of this opcode - one works on 'This' and the other on an object on the stack: ObjectMeth This,"Draw()" Structure: byte opcode String method signature ObjectMeth (SP-3),"Draw()" Structure: byte opcode 32bit SP offset String method signature 5.2.4 ObjectProp Fetch (a reference to) an object property onto the stack. Again, the same two forms: ObjectProp This,"HP" Structure: byte opcode String property name ObjectProp (SP-3),"HP" Structure: byte opcode 32bit SP offset String property name 5.2.5 ClassProp Fetch a class property. ClassProp "Math","randseed" Structure: byte opcode String class name String property name 5.2.6 SetObjectProp Set an object property. Note that this is not so much a case of changing a value as changing which object a property refers to. Again, two forms: SetObjectProp This,"HP",(SP-4) Structure: byte opcode 32bit SP offset of new value String property name SetObjectProp (SP-2),"HP",(SP-4) Structure: byte opcode 32bit SP offset of object 32bit SP offset of new value String property name 5.2.7 SetClassProp Set a class property: SetClassProp "Math","randseed",(SP-3) Structure: byte opcode 32bit SP offset of new value String class name String property name 5.3 Compound Instructions The DDL VM code has compound statements built in - If, For and While. This keeps the code structured, and means we can do entirely without jump instructions. If and While are implemented using 'conditionals'. A conditional is a (normally short) block of code which when it executes leaves a reference to a Boolean on the stack - if it leaves an object of any other type, it is a run-time error. *** I'm not sure about this... maybe Boolean is an interface... *** A For statement relies on the fact that its operand is in fact a List. If it is not, it will cause a runtime error. *** This should perhaps be broadened to allow any object which implements an interface 'Iterable' *** 5.3.1 If Conditionally execute two blocks of code - based on the value of a condition. See section on compound instructions above. If { instructions... } Then { instructions... } Else { instructions... } Structure: byte opcode List of Instructions condition clause List of Instructions 'then' clause List of Instructions 'else' clause 5.3.2 For Repeatedly execute a block of code, with a certain variable (read stack location) iterating through a list. The block is executed once for each member of the list, with the top of the stack (the loop variable) a reference to the current member. If the object supplied as the list is not actually a List, a runtime error is caused. For (SP-6) { instructions... } Structure: byte opcode 32bit SP offset of list List of Instructions body of loop 5.3.3 While Repeatedly execute a block of code while a condition is true. The condition is checked before executing the code each time. While { instructions... } Do { instructions... } Structure: byte opcode List of Instructions condition clause List of Instructions 'do' clause 6 Worked example 6.1 DDL Boolean DamageMons(Monster mons,Dice damage) { Integer actual; for Integer i in (1..damage.num) { actual += Math.Random(damage.sides); } actual += damage.plus; if (actual < 0) { actual = 0; } return mons.Damage(actual); } 6.2 Symbolic Assembly ; Function "Boolean DamageMons(Monster,Dice)" ; Symbolic parameter names : mons, damage Create "Integer()" ;Leaves return value on stack - henceforth 'actual' CreateInt 1 ;Allocate Integer object for constant 1 ;'anon1' ObjectProp damage,"num" ;Puts requested property on stack ObjectMeth anon1,"List operator..(Integer)" ;So, calls anon1.operator..(damage.num) since ;damage.num is on the stack. ;Creating a list var on the stack, 'anonlist' For (anonlist) { // Loop variable called 'i' (but unused anyway) ObjectProp damage,"sides" ClassMeth "Math","Integer Random(Integer)" ;Math is a global object... ObjectMeth actual,"Operator+=(Integer)" } Kill ;anonlist Kill ;anon1 If { CreateInt 0 ObjectMeth actual,"Boolean operator<(Integer)" } Then { CreateInt 0 ; call it anon0 SetObjectProp actual,anon0 Kill } ; as luck would have it, actual is already ; stack-uppermost, so we call method straight away ObjectMeth mons,"Boolean Damage(Integer)" ; leaves function result on stack (uppermost) ; call it 'result' (!) Swap mons,result ; puts result lowermost Kill Kill ;mons and damage in that order 6.3 Pure Assembly Create "Integer()" CreateInt 1 ObjectProp (SP-2),"num" ObjectMeth (SP-1),"List operator..(Integer)" For (SP) { ObjectProp (SP-3),"sides" ClassMeth "Math","Integer Random(Integer)" ObjectMeth (SP-3),"Operator+=(Integer)" } Kill Kill If { CreateInt 0 ObjectMeth (SP-1),"Boolean operator<(Integer)" } Then { CreateInt 0 SetObjectProp (SP-1),(SP) Kill } ObjectMeth (SP-2),"Boolean Damage(Integer)" Swap (SP-2),(SP) Kill Kill