tags wrapping things that aren't supported by MIDP. (Floating point, Reflection etc.) This seems to be a fairly reasonable use of #ifdef. Then I vaguely remembered (and confirmed) that Java doesn't have a preprocessor, and therefore doesn't support ifdef!//#ifdef ... //#endif
So... I wonder how Simkin is supporting this conditional compilation: is there an Ant task to preprocess versions of a source into appropriate directories and then build them?
And, if ifdefs are really considered such bad practise that the dubious miracle of "Write once run anywhere" was supposed to eliminate them, how should this be written?
From a vague recollection of (and sudden enlightenment on the purpose of) the Factory design pattern, I suppose that the code should be factored out into (taking the Interpreter class as an example)
(Googling "ant conditional compilation" finds various links such as
this one which look relevant)
Update: Aha, it looks like
Antenna, an
Ant extension specifically for Wireless Java includes a preprocessor task
among other goodies.
I'd originally planned to have a fixed size grid, which could be expanded if necessary. Microcalc works this way: you specify a starting size (8 x 32 by default) but can insert/delete rows later to change the size.
Then I decided that I wanted the grid to be more dynamic: Rows and Columns will be appended when needed, and deleted when not. If I ask for a cell (E111) or a range (A1:Z100) then that object, plus any relevant structure like the Rows and Columns that contain it should also be returned. But only if they are required! To explain what I mean a bit better:
public interface GridComponentCandidate { public boolean isInteresting(); public void registerSelf(); public void deRegisterSelf(); public void addRegistrar(GridComponentRegistrar g); public void removeRegistrar(GridComponentRegistrar g); } public interface GridComponentRegistrar { public void register(GridComponentCandidate c); public void deRegister(GridComponentCandidate c); }
There is an additional complexity for Ranges: the Head Line will need to be registered before the Tail Line. (Otherwise when the Head line tries to register it will be rejected as Out of Date!)
Since the formulas did depend on each other the order of (re)calculation made a difference. The first idea was to follow the dependency chains but this would have involved keeping pointers and that would take up memory. We realized that normal spreadsheets were simple and could be calculated in either row or column order and errors would usually become obvious right away. Later spreadsheets touted "natural order" as a major feature but for the Apple ][ I think we made the right tradeoff.Of course, the Visicalc team were implementing their spreadsheet in "a 16KB target, that included enough space to actually hold a useful spreadsheet" (!). This makes me think
Both of these have an obvious memory cost. Solution 1 can be reduced by having a single convergence limit for *all* cells.
- Having a "convergence limit" for each cell. If the value of the cell (or all cells) changes less than a certain amount, then consider the solution "good enough". If it's not good enough, add all dependent cells to the dispatch list to be calculated again. This has many subtle problems.
- Having a "iteration count" associated with each cell. If a cell tries to solve too many times, it's probably not converging and we aborted the run with an error.
Professional spreadsheets will tend to disallow cyclical calculations by default, but can be configured to allow them. A quick peek at Microsoft Excel 2000 shows that a global option can be set to allow iteration. You can select both a maximum iteration count, and a Maximum change (convergence limit).
I have to admit that I'm a little blinkered in that I've barely ever used iteration in a spreadsheet but my first thought is that Gilgamesh as a constrained J2ME application should not allow circular references at all. Sure, this means reduced power, but the Keep It Simple principle may be the more important consideration. Thoughts?
I could create my own Iterator interface and classes of course, but I'll think a little about other ways to do it too.
Update: 2003-05-09 01:47 OK, here is a version using MIDP's basic Collection structures, Stack and Vector. This DFS class exposes an interface similar to J2SE's Iterator. Note that this version also checks for Circular paths.
import java.util.*; public class DFS { private Stack stack; public DFS(Node n) { stack = new Stack(); push(n); } public boolean hasNext() { if (stack.isEmpty()) { return false; } else { return true; } } private void push(Node next) { Vector seen = new Vector(); push(seen, next); } private void push(Vector seen, Node next) { Vector vNext = new Vector(); vNext.addElement(next); push(seen, vNext); } private void push(Vector seen, Vector next) { stack.push(seen); stack.push(next); } public Node next() throws Exception { Vector next = (Vector) stack.lastElement(); Node nextval = (Node) next.elementAt(0); Vector seen = (Vector) stack.elementAt(stack.size() - 2); if (seen.contains(nextval)) { throw new Exception("Circularity error: " + nextval + " >>> " + seen); } next.removeElement(nextval); if (next.isEmpty()) { stack.pop(); stack.pop(); } if (! nextval.edges().isEmpty()) { Vector newSeen = vecCopy(seen); newSeen.addElement(nextval); push(newSeen, vecCopy(nextval.edges())); } return nextval; } public static Vector topoSort(Node n) { DFS d = new DFS(n); Vector topo = new Vector(); while (d.hasNext()) { try { Object o = d.next(); topo.removeElement(o); topo.addElement (o); } catch (Exception e) { e.printStackTrace(); System.exit(0); } } return topo; } ///////////////////////// private Vector vecCopy(Vector old) { Vector newVec = new Vector(); for (int i=0, s=old.size(); i < s; i++) { newVec.addElement(old.elementAt(i)); } return newVec; } }
I've given this some thought and I'd been planning to get to this shortly, but Alan Eliasen asked about it. Alan gave a good link to Bob Frankston's notes on the development of VisiCalc which also mentioned the evaluation strategy they used:
Every time a cell is changed, loop through all the other cells updating them once. (An obvious alternative strategy is column order).
I'd heard of this, considered it briefly, and rejected it because this brute force sounded inelegant, involving too much unnecessary work, especially for a larger sheet.
Originally, I'd planned for every cell to keep a list of pointers to dependent cells (any cell that references it).
When a cell is updated it will ask its dependent cells to update themselves. Those cells in turn may update other dependent cells: this means that there is a risk of circular referencing. So when I trigger the update I pass a reference to a Collection (an ArrayList, Vector, or HashTable) adding the original cell..-->B--. / \ A +-->D-->E \ / `-->C--'
Of course, the same cell may need to update itself 2 or more times: this is OK, as long as each time is on a different evaluation path. Here, D is evaluated twice, but the cell-paths passed in are not circular:
Each subsequent cell checks if it is in that Collection before appending itself to it. If it's already there, then this calculation path was circular, and should generate an error. Here, cell C is asked to recalculate in a path that already includes C:(A) .-->B--.(AB) / \ (ABD) A +-->D------>E \ / (ACD) `-->C--'(AC) (A)
(A) .-->B--.(AB) / \ (ABD) A +-->D------>E \ / (ACD) | `-->C--'(AC) | (A) ^ / | / `------------' (ACDE) ^ Circularity Error here!
This would work, but means that the same cell may have to recalculate more than once, which could be inefficient, and might have some pathological cases.
It would be best to calculate in which order the cells have to be updated, and then calculate them all once only. This is called 'natural order'. And while you might think that calculating the order to evaluate cells in was a tricky task, in fact there is a simple way to do it.
If you have a Computer Science background, you might recognize the dependency diagrams above as a DAG, or Directed Acyclic Graph. And you might realize that all we have to do get the evaluation order is to run a topological sort on single-source DAG. If you don't understand that, then read on!
Graphs in computer science aren't the same as the nice bar-charts and diagrams that non-geeks might expect them to be. They are just
Topological sort is a sorting technique that involves visiting each node and the nodes it links to in turn (often called 'traversing' the graph) in such a way that all dependent nodes are listed last. You can walk the graph in 2 ways, "Breadth first" (BFS), or "Depth first" (DFS). This is actually rather intuitive:
Starting with cell A we follow this branch to its end (depth first).-->B--. / \ A +-->D-->E \ / `-->C--'
It is fairly trivial to implement a minimal graph-walking algorithm doing a simple recursive tree walk. Of course it's often more elegant to do this iteratively. (For example Dominus will have some chapters on this in his forthcoming book). Using the Iterator interface is also Java best practise. (Though I think J2ME doesn't define the Iterator interface).
Here's some basic setup for a basic Node class.:
import java.util.*; public class Node { private Vector edges; private String name; public Node(String name) { edges = new Vector(); this.name = name; } public Collection edges() { return edges; } public String toString() { return name; } }It is fairly easy to add a DFS Iterator class:
import java.util.*; public class DFS implements Iterator { private Vector iterators; private Node curr_node; private Iterator curr_it; public DFS(Node source) { curr_node = source; curr_it = curr_node.edges().iterator(); iterators = new Vector(); iterators.addElement(curr_it); } /** The class works by keeping a stack of Iterators * for each fork in the Graph. The hasNext() method * checks if any of the Iterators in the stack have * any more Nodes to return (e.g. if they return true * on hasNext()). */ public boolean hasNext() { while (! iterators.isEmpty()) { curr_it = (Iterator) iterators.lastElement(); if (curr_it.hasNext()) { return true; } iterators.remove(iterators.lastElement()); } return false; } /** Every time a new object is selected, add the * Iterator for its own edges to the stack. */ public Object next() { Node next = (Node) curr_it.next(); Iterator new_it = next.edges().iterator(); iterators.addElement(new_it); curr_it = new_it; return next; } public void remove() { ; // just to fulfil requirements of interace. } }(Adding checking for cyclic paths in the graph is left as an exercise to the reader...)
And of course, add a method to the Node class:
public Iterator DFSIterator() { return new DFS(this); }Now it's easy to create the Topologically sorted dependency list (add this method to Node class). Iterate through the Graph, adding the latest node to the end, first of all removing any previous occurrences:
public Vector dependencyList() { Vector v = new Vector(); v.add(this); Iterator it = this.DFSIterator(); while (it.hasNext()) { Object o = it.next(); if (v.contains(o)) { v.remove(o); } v.add(o); } return v; }
Coming up soon:
I've come across some other projects, and had to re-question whether Gilgamesh will have its own expression evaluator as I suggested above. All comments welcome!
At the pub-meet I met Simon Whiteside, enthusiastically promoting his Open Sourced scripting language, Simkin. I'd read the blurb on it before, and thought "huh?" before bookmarking it and moving on. Simon says that Simkin sometimes gets this reaction, but has convinced me that it's worth looking at in more detail!
It's lightweight (50K jarfile), is implemented in both C++ and Java, and has been ported to J2ME.
it is also an expression evaluator which, if I understand rightly, can easily be customized to include custom objects. So, for the purpose of Gilgamesh, something like
=A5 * 4 + sum(B2:C5)Could actually get parsed as a Simkin expression which refers to Gilgamesh objects for the Cell (A5) and Range (B2:C5).
Though I'm happy to steal, um, leverage existing code to parse formulas (as I wrote above, I was quite happy to leave this to someone else!), I guess that it will have some impact on Data representation, and on formatting of Cell results. (I'll try to write up my initial thoughts on these soon). So I have to look into any existing code and API before I commit. Some good documentation from the Simkin site itself, including links to off-site articles.
If I do use Simkin, I should also think about using it as an on-board extension language!: I hadn't thought about having an scripting language in Gilgamesh, because it seems like unnecessary bloat, but if I get it for free it's worth thinking about.
Some time ago, I looked at the Micromath library, which powers Microcalc. Microcalc is one of the big influences on this project, in terms of inspiration, and a benchmark to try to compete with. Rafe has suggested it'd be worth getting in touch with the Microcalc development team (Michael Zemljanukha) to discuss the two projects: now on my todo list!
J2ME doesn't support Floating Point maths. Micromath gives Fixed Decimal maths support, and is a very compact Java library. However I'm leaning towards using the MathFP library (mainly because it has better documentation.)
Alan Eliasen has posted a comment mentioning his project Frink. From a first quick read of the (nicely detailed) docs, you could describe it as a mix between an expression evaluator that knows about the world, a miscellany of facts and figures, and a programming language that, like Perl, tries to DWIM ('Do What I Mean') in some really clever ways.
In one of the wonderfully whimsical examples in the docs, he calculating the rainfall in the Biblical Great Flood using the formula:
depth = 29030 feet + 15 biblicalcubits - (2.4 inches/ year 4000 years)Frink knows about feet, and inches, and years, and um... biblical cubits. Not to mention currencies, current and historical!
It also tracks the unit type of a calculation, making sure that you do sensible things with them: e.g. you can add 1 mile + 4 metres but it wouldn't make sense to add 1 mile + 4 volts. (2 metres) * (2 metres) will actually know to return an area in square metres! I love this idea: surely this should be mainstream?
Alan noted that he'd thought about using this in a Spreadsheet: I would love to see that happen, and I have to admit I'd be seriously tempted to use it in Gilgamesh. On the minus side:
But of course this is an oversimplification. We don't just need a list of cells that we reference: we need to know how they interact with the formula. The formula will need to be parsed into a tree of data relatingA +------+ 1| 1 | +------+ 2|A1*2 | <-- references: A1 +------+ 3|A1+A2 | <-- references: A1,A2 +------+
The last example shows a more complicated example, demonstrating that parsing the syntax could be quite intense. Of course, even a standard, commercial product like Excel simplifies the syntax by making it more functional:A1+A2 - function(+) - ref(A1) - ref(A2) if(A1==23 then "Yes" else "No) - function(if) - function(equals) - ref(a1) - constant(23) - constant("Yes") - constant("No")
As you can see, this is much more easily translated into the parse-tree above.if(equals(A1, 23), "Yes", "No") // Excel syntax
It looks a bit odd if you're not used to it, but it's trivial to build the parser. My first design idea is this:(function arg1 arg2 arg3 ...) Converting the formulae above: (+ A1 A2) (if (equals A1 A2) "Yes" "No")
=1 + 2 // arithmetic operators come inline =function(arg, arg) // functions as standard =A1 // reference unchanged
The Function class will have a simple interface something like:(if (= A1, 23), "Yes", "No) - Formula: - Function: if - Formula------------. - Constant: "Yes" \ - Constant: "No" \ - Function: = - Reference: A1 - Constant: 23
and is extended by subclasses like so:public abstract class Function { public String display; public boolean CheckArgs(Value[] args) { // ... (filled in by subclass) } public Value execute(Value[] args) { // ... (filled in by subclass) } }
public class EqualsFunction extends Function { public EqualsFunction() { display = "="; } public boolean CheckArgs(Value[] args) { if (args.size == 2) { return true; } else { return false; } } public Value execute(Value[] args) { if args[0].equals(args[1]) { return BooleanValue.TRUE; } else { return BooleanValue.FALSE; } } }
This might well be too heavyweight for being on a J2ME constrained device. But I think it will have its advantages: we'll see as we go on and try to optimize later if we need to. More on this topic later.