Project Gilgamesh: Simkin Archives

June 26, 2003

More Simkin overloading.

Thinking some more about overloading:

Evaluable is a bad name for the interface. I think I'll probably extract a number of overloaded interfaces and call them

OverloadValue (was Evaluable)
OverloadNumOp (was Overloadable)
OverloadCompare
OverloadStringOp
OverloadBooleanOp

Thinking about my interest in Simkin (modeling spreadsheet Cells), I came up with another issue: a Spreadsheet could contain

a String
a Boolean
an Integer

So far, so good: we just implement OverloadValue to return the appropriate thing. But as it might also implement some other custom classes (for example to get around the lack of floating point support in MIDP 1.n)

GilgameshDecimal
GilgameshDate

How do we return these? We could implement OverloadNumOp instead. But it means that the Cell class has to handle even simple cases like adding Integers. This means reimplementing all the good work handled for us by the Simkin Interpreter!

So... I've come up with another class

OverloadProxy

This Interface contains a single method public Object getProxy(). When an OpNode is encountered, if either side of the operator expression implents OverloadProxy then it will be substituted for the object returned by this method. The returned object could be an Integer, String etc. (which are handled internally by Simkin) or it could be an overloaded object of some sort which handles some or all of its operators by itself. (for example, the putative GilgameshDecimal object would implement OverloadNumOp).

All this means that we can still use a containing object such as a Cell as an Object, and call methods on it; but also substitute it for actual values used in calculations.

=A1.address()   // calls address method on object A1
=A1 + A2        // converts A1 and A2 into their proxy values and operates 
                // on these.

Note that if the Cell contained a special value that it wants to handle itself, it can always return this in response to a call to getProxy()!

(Any similarity to Perl's overload mechanism may not be entirely coincidental). Perl operators can be overloaded one by one. But as far as I can see, this would require a Java interface for every operator! So grouping them into functional areas seems to be a good compromise.

Posted by osfameron at 01:51 PM | Comments (0) | TrackBack

More thoughts on Simkin

With the work on ranges nearing completion, and the expression order evaluator theoretically complete (if not actually implemented...) I've been looking at expression evaluators again.

Or, to be perfectly honest, at Simkin. It appeals to me because:

It's small (22Kb)
It's ported to J2ME
It's open sourced

(+ I met Simon Whiteside at the Exposium 2003 and he talked enthusiastically about it which was inspiring). The documentation for Simkin is nice, especially the API which is very well javadoc'd, but much of the background information is a bit light on detail for someone considering embedding it in their application. I'd have liked at least these documents:

A user-centric tutorial on the language, to demonstrate Simkin's strengths.
A deeper discussion on the ways that Simkin can be called (interpreters, contexts, XML, databases etc.)
An architectural overview of the entire system.
A tutorial on the Executable interface, and how to retrofit it to an existing Java object in order to script it.

I'd like to help document some of these points (certainly the last), given sufficient time...

The good news

Some aspects of Simkin fit perfectly with what I want out of Gilgamesh. An expression beginning with '=' can be parsed, and run in the context of an object, like a Cell. That object is now responsibly for dispatching methods and resolving variables relating to it. An example would be:

Cell A1: =sum(B2, C2:D3)

The cell A1 will be asked to resolve the name "A2" and pass back an object to the Simkin interpreter. This is nice, as it means we can lazily provide cells and cell ranges to formulas! It also makes it trivial to implement the semantics I've suggested of dynamic cell references by column name:

Cell A1: =sum(B, C2:D3)

In this case, A1 will take the Column ref 'B' and return the Cell object in the same row (B1).

Through a mysterious serendipity, colon delimited expressions are also passed to the object so C2:D3 in the example just works! Without customizing the parser at all! (However row ranges, like 1:10 won't work. Sigh. I guess a kludge like "row1:row10" would be ok?)

The sum() method will also be called on the cell. (That's not particularly useful in this case. But the cell can just pass this up to Gilgamesh's function dispatcher.)

The bad news

For some reason, I'd imagined that cell names would be resolved to objects at compile time. But in Simkin the cell name is stored in the parse tree, and the object is only resolved at run-time. This means that if we insert or delete rows and columns, the name could risk pointing to the wrong cell!

But there is nothing to prevent the Cell caching the name of the reference with the object itself in a Hashtable. The next time Simkin asks for "A1", the Cell could return the right object, even though in the mean time it's been moved to Z100!

The problem finally is what happens when the user tries to edit a formula containing references to cells that have changed addresses? We can't just return the formula string that was originally passed in. The user might have input "=sum(B2, C2:D3)", but we may need to return "=sum(A1, G10:H15)". I can think of two solutions:

When the user wants to edit a formula: Walk the parse tree substituting any variable matching a cached name with the new name belonging to the cached object. (Now we need to deparse the parse tree and return the String to the user...)
Splice in the new names for the cell ranges into the original formula String. This would save some faffing with parse-tree walking (not to mention the deparsing code). But I've not worked out if it's possible to find out which position in the formula string the references are.

Though I the second version would be faster, it requires a better understanding of the compiler than I've managed to glean so far, and also could mean storing the column offset of every parse-node, which might be too heavyweight. In the meantime, I've started work on a basic deparser.

Evaluable

Simkin will automatically cast variables to float, int, string or boolean as appropriate. Strings will get parsed into numbers; Booleans are true if they are non-zero numbers or the string "true" etc.. This is quite clever and time-saving. (This sort of clever dynamic Do-What-I-Mean-ing is one of the reasons that I like Perl for example).

Gilgamesh Cells can contain numbers, Strings, booleans, Dates etc. This means that as Simkin stands, if a Cell contains an integer it would first be stringified with toString() and then parsed back into an integer! Also, it'd be nice to overload the object to have finer control over how its values are used in calculations. As a simple example, an object could return "Two" as a String, and '2' as an integer.

As another example, we might want an Error value to exist. If it's wrapped by a function (=if(iserror(A1), "!!!", A1)) then just evaluate it. But if we try to get its value (=A1 + 3, where A1 contains an Error value) then it will throw an exception which will cause the entire expression to return an Error.

I've created a proof of concept interface Evaluable, and modified some of the methods in Interpreter.java (intValue, boolValue etc.) to use its behaviour where possible.

Overloadable

Then I went slightly over the top and a proof of concept for arithmetic overloading... As J2ME doesn't support floating point class, I'll want to implement a Fixed Decimal class at some point. Also, I want to support dates. Dates can be added to numbers (21st January + 1 = 22nd January) but not to other dates (21st January + 10th June = ???). To support all this complexity in the interpreter itself is going to be nasty without some hefty refactoring. So I discovered a solution of extracting operations on conforming Objects into a nice interface (Overloadable) and letting the Classes that implement it worry about how to operate on each other...

The Overloadable class defines methods like .add, .subtract etc. When the Interpreter wants to (say) add two values, it will check if either of them are Overloadable. If so, it'll ask this object to handle the operation with the other value as its parameter. As an object could be asked to operate with any other Class, the second parameter will be passed as an Evaluable object to ensure a consistent interface.

(If neither object was Overloadable, then we fall back to operating on them as Integers or Floats as appropriate.)

So...

I'm glad that I don't have to implement my own macro language from scratch. Simkin has some really nice features, and I'm happy to work with it and possibly contribute something to it. (Of course now I have to hope that these ideas work with the development of Simkin in general).

Posted by osfameron at 01:26 AM | Comments (0) | TrackBack