Bizarre Love Triangle

Correct me if I’m wrong (and I’m sure I am), but I’m not sold on the Tuple and Function implementations in Scala. A Tuple is just a typed wrapper around multiple elements, allowing you to treat multiple objects as one object. Their difference with other wrapper/container constructs like List (or Collection in general) is that we know, at compile time, a) the number of elements and b) the type of each element. Functions follow a similar compile-time pattern, since you know the number and type of elements for the function at compile time. Tuples are very useful constructs – for example, you can imagine any n-arity Function as being a 1-arity Function taking a single TupleN argument; this elegance is expressed in Scala’s FunctionX.tupled, overloaded for functions of arity two through five. If you have some Traversable of Tuple2s, you can create a Tuple2 of Traversables holding the corresponding elements of each Tuple2 by simply calling Traversable.unzip. There’s a similar method for Tuple3’s called unzip3.

Unfortunately, design imperfections start to creep in when you try taking advantage of Tuples. None of the TupleNs have any class relation to each other except for all being subclasses of Product. I haven’t explicitly run into this problem yet, but it’s still troubling to think that Scala treats Tuple1._1, Tuple2._1, and Tuple3._1 as completely different properties. Product has a productElement(int) method that returns the nth element of the Product, but the guarantee is so weak that the method a) returns an Any and b) can only throw a runtime exception. productElement breaks both type safety and compile time guards, two of Scala’s stronger selling points. productIterator similarly returns an Iterator[Any]. I’m not sure why productIterator can’t return an iterator of the common supertype of all its elements. None of the FunctionN’s are even related to each other except for AnyRef. The Function object’s tupled, untupled, and uncurried methods have separate, overloaded implementations for Function2, Function3, Function4, and Function5. Function lacks a method returning the number of arguments it takes, which I feel should be easily done without copypasta. I guess the number of arguments is embedded within the Function’s type, but that’s programmatically annoying and unintuitive to access. unzip and unzip3 exist and satisfy most use cases, but it smells (as a consequence of the way Tuples are implemented). Code repetition is to be avoided – but the Scala core library itself contains lots of repetition on this front.

I feel like the problem lies in characterizing the relationship between Tuples of different sizes. We can think of any Tuple2 as a Tuple3 with an “empty” element (perhaps a Tuple3[A, B, Nothing] type or Tuple3[A, B, Unit]); a Tuple3 can also be seen as a Tuple4 with an “empty” element, etc. etc. One could put some upper limit on the Tuple arity (Scala’s already doing this) and have all other Tuples descend from that class, but there will still be a bunch of code boilerplate (create 21 classes, with each TupleN extending Tuple(N+1)), and the performance penalty of carrying a bunch of Units around would be hefty to say the least (perhaps specialization could help us here). Also, is Tuple3[A, Nothing, B] the same “thing” as Tuple3[A, B, Nothing]? I guess it depends on the application and programmer to assign meaning to the elements in the Tuple. If we instead build up, a Tuple3 can be seen as a Tuple2[A, Tuple2[B, C]]; a Tuple4 is a Tuple3[A, Tuple2[B, Tuple2[C, D]], or some similar Tuple nesting scheme, etc. etc. This looks to be a more reasonable approach to building Tuples, and is also the way Lisp and most of FP does it. Scala’s own List class has :: and Nil. The relation between Tuples and Lists is something to be explored. This connection is further complicated when Functions enter the scene, especially with regards to function currying. Naively, I think the problem lies in that there are multiple representations, in code, to achieve roughly the same meaning, and that there are no ways to easily convert from one representation to another. A Function3[A, B, C, R] and a Function1[(A, B, C), R] and a Function3[(A), (B), (C), (R)] and a Function2[(A, B), C, R] could all be the same thing. I don’t even want to get into the different ways you could group smaller Tuples to create a larger one.

Advertisements

Expression Evaluator

I’ve always wanted to have the capability to type in expressions into a sketch and have it react accordingly; to me, it’s kind of always been the “next level” of interactivity with a sketch. I remember a while back there was a sketch, possibly on OpenProcessing (but I can’t find it now) that had a series of about 20 circles rotating about like some sort of solar system, and there was a text field letting you type in an expression as a function of n (presumably the “number” of the circle) which determined the size of the n-th circle. Or something like that. I LOVED that sketch since you could get a massive variety of beautiful interactions by just changing what the expression was.

To be honest I’m very surprised that there isn’t a math/symbolic expression library within the Java standard library. Imagine if there was a standard mathematical input widget within AWT, or Swing, or even controlP5 – great ideas would spring out from all corners of the sketch community. I might even consider creating and maintaining a simple math library to help others along. I personally have always felt a longing for a simple math library for several years. I remember trying to write a symbolic evaluator back in 10th grade of high school – and I got SWAMPED. Of course, I lacked the necessary understanding of both math and programming back then, but I didn’t even get close. I wanted to write programs involving mathematical functions, but the libraries just didn’t exist.

Anyways, enough of the rambles. An arithmetic calculator is the flagship example of Parser Combinators, and I found it enjoyable to extend the “usual” grammar to include variables and functions. I have a feeling that this grammar is going to be heavily used in my future sketches. Sep12 isn’t fancy; it just demonstrates the evaluation capabilities.

See the sketch here.

Sep12


Concepts, pt 1

I’ve been thinking for a while about a programming language that tries to make as much as possible explicit and controllable. That is, a language that has the mechanisms to define and program core features of the language itself. I think this would solve a large number of programming annoyances and problems, because I believe that the problems are caused by a) unexpected or unwanted consequences to the formal rules and mechanics built into languages, and/or b) a lack of abstraction and reference mechanisms. For instance, consider writing a singleton object. A singleton is basically the combination of a class and its associated type, plus exactly one instance of that class. We group these multiple concepts together as a single conceptual unit, which we call the singleton. In Java, it might look like this:

class MySingleton {
   private static MySingleton instance = null;

   public static MySingleton get() { 
      if(instance == null)  instance = new MySingleton();
      return instance;
   }

   private MySingleton() {...}
}

See all that boilerplate? It’s frustrating. It’s frustrating because the concept of a singleton is very simple. But singleton operates on the level of classes, types and instances, which are silent, built in formalities within Java. Java alleviates *some* of the issues by giving you a class for classes (succinctly named Class), but it’s not enough. We would like to write an abstract class representing the idea of a Singleton, but Java doesn’t even come close to giving you the capabilities.

The Java concept of a “class” tries very hard to be isomorphic to – or “equal to” – our mind’s concept of “concept”. When we write something like “class Dog extends Animal” in Java, we’re trying to map out a one-to-one, invertible relationship between our mind’s concept of a Dog and OOP’s “concept” equivalent, class. Keeping this connection accurate is incredibly important to our ability to comprehend a program, because it allows us to reason and attach meaning to the program in a manner we’re very familiar with – reasoning in our own mind.

Let us call this connection the Conceptual Function. It is actually a pair of functions: the first takes concepts in our mind as input and outputs concepts in code; the second takes concepts in code as input and outputs concepts in our mind. These two functions should be inverses of each other. We strive, in programming, to adhere to that principle; to keep the Conceptual Function as nice and clean-cut as possible – more formally, to make the Conceptual Function both one-to-one and onto (I’m abusing definitions here since we want BOTH functions to be one-to-one and onto, but when they’re inverses of each other the invertibility of one implies the invertibility of the other). “class Dog extends Animal” reads very nicely because the Conceptual Function is invertible; our mind’s concept of a “Dog” is represented very accurately as the Dog class – similarly, the Dog class in Java represents one very clear concept in our minds – that of a dog.

Unfortunately, Java’s class is far from being a perfect representation of a concept. The singleton example shows that. At the very least, classes would have to be first-class members, so that we could, say, implement the singleton as a method:

public Pair<Class, InstanceOfClass> singleton(Class parent, List<Declaration> body) {
   Class newC = parent.deriveClass(body);
   InstanceOfClass inst = newC.createInstance();
   return new Pair(newC, inst);
}

This method takes a class and some extra information as arguments and outputs a class and instance pair; obviously this is way out of bounds. Issues arise regarding the definition of sameness and “single”-ton: if you call the method with the same parameters multiple times, then you’d get multiple singletons representing the same mental concept – and boom your Conceptual Function starts getting thrown off. But this issue of “sameness” has been a problem plaguing all programming languages. What if you wrote a singleton in legal Java, copied and pasted it into another file, and then renamed that class into a different class? Are those two the same? This is an issue I’ll talk more in depth about in a later post. Scala alleviates this use case of Singletons by introducing special syntax (‘object’) that basically hooks up the wiring for you, but that’s like trying to cover up all the holes on a sponge by taping bits of paper over every hole – there’s a better, more novel way to solve the problem.

This area of thought is extremely interesting to me and I have a lot to say about it. Expect future posts in this series to be about the Conceptual Function, the mechanisms of naming, mental models and how concepts fit into them, and finally I’ll try to flesh out ideas for a new language to address these issues I’m talking about.


Semantic patterns

Processing’s PVector, JBox2D’s Vec2, java.awt.geom.Point2D, Geomerative’s RPoint, JTS’s Coordinate, point2line’s Vect2, toxiclibs’ Vec2D/Vec3D… the list goes on. I’d conservatively say that there are no less than 10 slightly different implementations of vector/coordinate/point classes on my laptop now. Each one of them has an x/y field, each of them has an “add” and “subtract” equivalent; all of the vectors have scale, magnitude, etc. type operations.

Why are there so many versions of what is basically the same idea? Why is there so much code repetition? I think the most straightforward answer is that each library has a different ideology, a different code style, and with that a need for a different implementation. Some (most) implementations use mutable floats for the x/y values; others make immutable classes whose operations return new objects. Some methods are named “scale”, and some named “mult”. Coordinates and points might be in two dimensions, or three dimensions, or N dimensions. Depending on the library it was built for, each class will have different features and shortcomings.

But there is obviously some overlap. I understand that there’s a “difference” between points, vectors, and coordinates in the theoretical sense, but as far as the computer is concerned, we’re working with two or three floats and a set of methods that operate on these floats. I can assure you that the method for adding vector a to vector b will look like:

public Vec2 add(Vec2 b) {
     return new Vec2(x+b.x, y+b.y);
}

The method might instead add the given vector’s coordinates to the local variables, like:

public void add(Vec2 b) {
     x += b.x; y += b.y;
}

Regardless of the exact implementation, the abstract idea is the same. We’re trying to add two vectors together. So, being a computer scientist and obsessed with efficiency and uniformity, I think there should be some way to tell a program that these multiple classes represent the same thing. There needs to be a way to give semantic information about what a class represents to a program, and then a way to let that program use the information. This raises deeper philosophical questions as to how to tell if two ideas are “the same” in the first place, but I think the inherent subjectivity in meaning could be circumvented using the formality of mathematics and logic. At this point in time I see the very bottom of math being composed of two things: numerals and sets. Numerals could probably be encoded as sets (0 = empty set, 1 = set containing empty set, 2 = set containing 1, etc) but that would probably create a lot of tedium and too much rigidity; making numerals arbitrary is fine. Perhaps to build a programming language off these extremely basic ideas, and keep building until you have today’s useful data-types and structures, would allow for a whole new level of abstraction. But I’m getting away from myself.

Personally, I really like how JBox2D’s Vec2 class contains operations that always returns the result (aka the first “add” method as seen above). But I’m also writing a 2D pan-and-zoom camera class for Processing, and I don’t want to add a dependency on JBox2D just to get at his Vec2 class. I could copy and paste and write yet another vector implementation, but then the two classes would be incompatible with each other even though they are effectively the same class and I’d get frustrating “incompatible type” errors if I used JBox2D with Processing (which happens a lot). I settled for using Processing’s PVector because it was included in the library. Unfortunately, the PVector’s add, scale, sub, etc. operations all return void (meaning they mutate the variables locally). There are static methods that take two vectors and add/sub/cross/etc. them together and return the resultant vector, but it leads to ugly code like

...
        PVector sub = modelPoint.get(); sub.sub(corner); sub.div(s);
        corner.set(PVector.sub(modelPoint, sub));
...

Something like corner = modelpoint.sub(modelPoint.sub(corner).div(s)) (or even better corner = modelPoint - ((modelPoint - corner) / s) as in Scala) is much more concise. Of course, the methods with those signatures exist only in Vec2, and not in PVector. But why? Why couldn’t the computer use the sequence of operations as prescribed in Vec2 on PVector? Specific computer details aside (aka allocating memory, reading and writing the floats), it makes sense to be able to use Vec2’s signature with PVector. I could write a Vec2 version of add, mult, scale, normalize, etc. in PVector with absolutely no problems, because the conceptual ideas of the two classes are equal. I want to say that the problem is that there is an extremely closeknit coupling between a class’s bytecode and the semantic idea behind it, but I feel like I’m mis-using terminology in there. Subtyping a common abstract superclass could solve this problem but subtyping is dependent a) on the programmer recognizing that the supertype exists (or on a supertype existing at all), and b) the programmer deciding that the supertype is well suited to fit the implementation. Rarely does this actually happen outside of class hierarchies built all at once, under the same project.

This idea is not restricted to vectors. Every class (and even every package and every project) has certain concepts behind it that defines exactly what the class should be doing. One might percieve a mathematical function that takes concepts as input, and with some further configurations and considerations taken into account, outputs a class or set of classes. You can perceive programming as the act of computing this function. This function is certainly not one-to-one, as my vector example showed. Different configurations for the function will have it output different classes but these classes should share a common type, a common theme.


The evolution of code

I’m wondering what’s the best way to write good, maintainable, logical code that evolves and adapts as your needs change.

For instance, right now I’m developing a multiplayer game built off JBox2D, but the code originally started off as just a way to see if I could save/load JBox2D Worlds. I showed the program to a few friends and they got excited about the physics and proceeded to generate a lot of good ideas for games that could be created out of it. So I’ve been adding this bit and that bit; right now I’ve got two-player synchronization and a few game mechanics in, but I’m running into problems with attaching game specific data onto the World (I’ve been using a Map<String, String> as my UserData that gets serialized with the Body). Long story short I’m now finding myself re-writing serialization code for my own application-specific classes. It would be much easier to have written wrapper classes for World and Body (and make them implement Serializable) to begin with, but of course at the beginning I didn’t know what direction the program was going to take.

That tends to happen with programs – the more you write code, the more you realize that you need to completely rewrite the code. Is there no other way? One could plan ahead and design classes and abstractions from the beginning to alleviate some of the necessity to re-write code, but sooner or later I always find myself doing a complete overhaul of this part or that part (or every part) of the code. The basic problem is that you never know what direction a project will go in; you can steer it but there will always be new things that pop up – new features, overlooked problems, a change in concept, etc. Is the only solution to overhaul the internal mechanics, again and again? You could possibly write a lot of abstraction barriers to begin with but in general any kind of decision may need to be changed.