Stringy identifiers

May 28th, 2008

In most computer languages, identifiers can only contain alphanumeric characters (plus, often, the underscore '_') and can't start with a number. There are very good reasons for this, but there are occasions when it would be desirable to be able to use arbitrary strings as identifiers. For example, suppose that we have a language that has a parser generator built in. Suppose also that we give it a parsing rule like this:

WhileStatement := ("while" | "until") "(" Expression ")" Statement

This is really just another way of writing a parsing class:

class WhileStatement { ... }

We would like the members of this class to be automatically generated by the parser generator in a predictable way. With string-identifiers, the generated class might look something like this:

  class WhileStatement
  {
    Enum { `"while"`, `"until"` } `"while" | "until"`;
    Expression expression;
    Statement statement;
  }

i.e. the name of the member variable which holds the piece of data about whether this is a "while" loop or an "until" loop is called `"while" | "until"`. The type of this member is

Enum { `"while"`, `"until"` }

which means that it can take only the values `"while"` and `"until"`.

This class can be used as follows:

  WhileStatement whileStatement = WhileStatement::Parse(file);
  if (whileStatement.`"while" | "until"` == `"while"`)
    ...

This reads very naturally, and avoids the need for anyone (human or compiler) to make up any variable names. It's good to avoid making the programmer make up names for things when not absolutely necessary.

Random numbers

May 27th, 2008

A commenter asked how random programs like random song shuffle work. How can a computer (which follows the same instructions each time) have unpredictable behavior?

Well, to understand this, first think about how random numbers are generated in the "real world" outside of computers. One way to do this is to throw dice - as long as you don't do something tricky, most people would agree that that's a perfectly reasonable source of randomness.

Except that if you really think about it, it isn't really random at all. In principle, one could make a machine that throws dice exactly the same way each time and always get a double six. You'd have to isolate it from stray air currents, minor gravitational influences from passing traffic and so on (as needs to be done for many sensitive scientific experiments). If you think about it, the randomness you get from throwing dice is really just due to these kinds of factors (and the fact that we human beings can't control our muscles finely enough to exert precisely the same forces each time). So really, what you're doing when you're throwing dice is "amplifying" these unpredictable factors (air currents, position, throwing force etc.) so that any tiny changes in them result in an entirely different number coming up.

Computers generate random numbers the same way. Well, sort of - they don't simulate the physics of three-dimensional dice being buffeted by air currents and bouncing off things - they do something that doesn't require such a complicated program but the same principle is at work. One reasonably common one (not the best, but fine for uses such as games) is to multiply by 22695477, add one and then take the remainder after divison by 4294967296. If I give you the sequence 953210035, 3724055312, 1961185873 you'd be hard pushed to tell me that those numbers were generated by that formula starting from the number 42 (maybe not quite so hard pushed as telling me the precise forces acting on dice given the numbers that came up but you get the idea). Other random number generators use the same idea but with more complex formulae to make the patterns even more obscure.

The problem with this method is that (as with any computer program) given the same input you're going to get the same output every time. And indeed this was a problem with some home computers and calculators in the 80s - if you tried to use the random number generator right after turning the machine on you'd get the same "random" numbers each time. The solution is to "seed" the random number generator somehow. The usual method is to use the current time. While it's predictable in some sense, you will get a different random sequence every time in practice so for most purposes you will never notice a pattern.

For some other purposes (like generating cryptographic keys or passwords, which have to be unpredictable even to people who are trying very hard to predict them) just using the time isn't enough and you have to mix in other sources of unpredictability (or "entropy" to use the technical term). This is why some SSH clients ask you to bash on the keyboard or wiggle the mouse the first time they are run - someone who wants to intercept your communications would have to know exactly which keys you pressed and how you wiggled the mouse (with extremely high precision). This is very easy to get wrong (and difficult to know when you've got it wrong) and can lead to security holes. Hence the old saw that "generation of random numbers is too important to be left to chance"! Some computers even have hardware built in which generates random numbers from unpredictable physical processes like thermal noise or quantum wavefunction collapse.

Once you've got a sequence of (pseudo-)random numbers evenly distributed over some range (for example 0 to 4294967295 in the above example) there are a variety of techniques to massage these into various different forms for different purposes. If you want an integer in the range 1 to 10 you can find the remainder after division by 10 and then add 1. If you want a random real number in the interval 0 to 1 you can divide by 4294967295. If you want numbers from a Gaussian distribution you can use a Box-Muller transform. And if you want to shuffle songs in your playlist you can start with an empty playlist and add a random song to it (removing it from the original playlist each time) until the original playlist is empty.

This last algorithm has a flaw in some sense (though it's more of a flaw in our brains). While all possible orders are equally likely, it will tend to play two songs in a row by the same artist more often than you would expect (just look at the bug database for any open source media player application to see the evidence for this). The problem isn't that the numbers aren't random enough, it's just that that if you have n artists in your playlist you'll see this phenomenon once every n songs on average. The problem is that we tend to notice this when it happens, and because the times when it happens stand out more than the times when it doesn't the problem seems to be worse than it really is. Another factor is that we are used to radio stations, which deliberately avoid doing this by hand-picking the order of songs rather than using a random shuffle. Some media player applications have appeased the complainers by actually making their shuffle features *less* random - adjusting the algorithm so it avoids picking orders which have two songs by the same artist in a row. This seems hacky to me, but if users like it I suppose I can't disagree with that.

Object relational mismatch

May 26th, 2008

I don't have a lot of experience with SQL databases. Most of the programming I have done is object oriented. It's a little odd to hear about the "object relation impedence mismatch" because as far as I can tell the two paradigms complement each other quite nicely.

Object oriented programming creates complex things by sticking simple things together. So you might have a Customer object which can have any number of Order records. The Order records associated with a particular customer are easily accessible to code running in the context of a particular Customer.

Relational programming slices things up the other way. There is a table of Customers and (completely separately) a table of Orders each of which points back to a particular Customer. All the Orders are stored together no matter which Customer did the ordering. This has the advantage that you can easily look through the Orders for (e.g. every Order of a particular widget, without looking through all the Customerrs first.

The downside is that relational programmers spend lots of time writing things like "SELECT o FROM Orders where o.customer==customer" to retrieve the data that would just "be there" in context, if you were using objects.

It seems natural to try to combine these two styles - to persist objects to long-term storage by slicing them up into normal form components which are then stored in tables representing different classe, and to automatically generate queries for interesting chunks of related data and construct objects based on them. Then you could generate different object structures for different problems (for example, your Order objects might be organized by widget instead of by customer for stock-keeping purposes.

I know some things like this have been done but it seems such a clearly useful thing to do that I'm surprised it isn't completely ubiquitous by now. Why don't OSes come with a transactional databases as basic system services? Is it just that the object people and the relational people don't talk as much as they should? It would be really cool if a compiler stored its syntax tree in such a way that I could make a list of (for example) the set of "if" statments in a program being compiled.

Mixins in C++

May 25th, 2008

A desirable property of C++ class libraries is to be able to leave the inheritance hierarchy up to the user of the library. For example, suppose you are writing a UI framework. You have a "Window" class - every window in the system is represented by an object of some subclass of Window. Windows can be animated, scrollable or dockable. Each of these attributes has some data and methods associated with it, so naturally forms a class. But any combination of these attributes also forms a class. So to be fully general your library would have to provide 2n possible classes - clearly not very practical.

A solution is to provide each attribute as a Mixin (the name was inspired by an ice cream parlour near MIT which allowed you to mix in extra items like crumbled cookies and jelly babies into your base ice cream flavour). A Mixin is a template class parameterized on its superclass, like so:

template<class Base> class Animated : public Base { ... }

Then you can specify a class hierarchy like:

Animated -> Scrollable -> Dockable -> Window

as the template class:

Animated<Scrollable<Dockable<Window> > >

The trouble with this is: how do we construct an object of such a type? An object's constructor needs to know how to construct its base class, which means that you'd still need to write 2n constructors. I found a paper which provides a rather convoluted solution using all sorts of macros and template programming wizardry, but I thought of a much simpler way.

All you need to do is have some sort of standard for constructing an object. For example:

template<class Base> class Animated : public Base
{
public:
    class Params
    {
        friend class Animated;
    public:
        Params(typename Base::Params bp, ...) : _bp(bp) { ... }
    private:
        typename Base::Params& _bp;
    }
    Animated(Params p) : Base(p._bp) { }
    ...
}

And similar subclasses (with the same name) for all the other mixins and the base flavor class.

Then you can instantiate your composite class as follows:

Animated<Scrollable<Dockable<Window> > >
    window(Animated<Scrollable<Dockable<Window> > >::Params(
        Scrollable<Dockable<Windows> >::Params(
            Dockable<Window>::Params(
                Window::Params("blat"), "baz"), "bar"), "foo"));

or, if you don't like typing n2 identifiers:

Window::Params wp("blat");
typedef Dockable<Window> D;
D::Params dp(wp, "baz");
typedef Scrollable<D> S;
S::Params sp(dp, "bar");
typedef Animated<S> A;
A:Params ap(sp, "foo");
A window(ap);

You also get the advantage that your calling code says exactly which parameters belong to which mixin, improving type checking (similar to the named parameters construct described in D&E).

Scripted merge

May 24th, 2008

I've been thinking a bit about version control systems. It occurs to me that almost every element of a version control system is actually fairly easy technically except for two - diff and merge.

Diff is the process of finding the difference between two files - I've written about some ways to deal with this in the past.

Merge is process of taking two changes X and Y and finding the change Z consisting of both X and Y. This sounds easy but it's really solving the "A is to B as C is to what" problem (the original being A, X being B, Y being C and Z being the answer). Suppose X is adding a method to a class and Y is renaming the same class. Most merge systems would add the new method with the original name (since Y didn't include renaming of the new method).

I think the answer is to treat every change as a little program in its own right. X isn't just "inserting the lines corresponding to the new method", it's "insert the lines corresponding to the new method, substituting the current name of the class where appropriate" and Y isn't just "change this list of instances of Foo to Bar" it's "change Foo to Bar everywhere it appears in the context of a class name in namespace Baz". In other words, have the changes themselves actually have some intelligence about what they are supposed to be doing. Then these programs could just be "run" (in any other) and produce the correct result.

Such changes would be more than just lists of "insert this text in this position in this file" and "delete this text from this position in file", and could therefore not easily be generated from diffing two files.

However, such changes could be generated by a text editor which has some understanding of the format of the files it is used to edit. For example, if such an editor had a "rename class" command, it could generate a change saying exactly that.

Gun control

May 23rd, 2008

Lots of people in the media (especially in the US) say lots of things about guns. Here are my thoughts on the matter.

  1. Killing people is bad. One should try to avoid killing people as much as possible.
  2. Guns are, fundamentally, designed for killing people (some may be sold for other purposes but killing people is what guns were invented for).
  3. Because of (1) and (2) there is something inherently quite distateful about guns.
  4. So it is understandable that many people dislike guns and don't want to own one or be around people who have them.
  5. There are some types of weapons (such as nuclear weapons) which no individual should be allowed to own.
  6. There are some types of weapons (such as kitchen knives) which any non-incarcerated adult should be allowed to own.
  7. Not all weapons fall into categories (5) or (6).
  8. There are some people (those with criminal pasts, unsound minds or who are just not sufficiently responsible) who should not be allowed to own guns. Loopholes which allow them to obtain them legally should be closed.
  9. It would be impossible to ban all private gun ownership in the USA.
  10. When a national government bans all its citizens from doing something which was previously allowed and widely practiced, this is usually a bad thing.
  11. Because of (9) and (10), some people should be allowed to own certain types of gun.
  12. Even those who have had lots of training in the safe and proper use of guns sometimes make mistakes and shoot someone who is not a threat.
  13. Some criminals are not afraid of death, and will commit mass murder even knowing that doing so will get them killed.
  14. In practice, most of the people who want to own a gun and should be allowed to own one probably own one already.
  15. Because of (3), (8), (11), (12), (13) and (14), arming everybody is not a solution to gun crime not even practical (no matter how much some gun enthusiasts would like it to be.
  16. As it is in the interests to gun sellers to sell as many guns as possible, it should not be up to the gun sellers to determine who should be allowed to own a gun.
  17. It would be difficult but possible to prevent most of the people who should not be allowed to have guns from having them, to reduce the number of guns in the hands of people who should not have them, and to reduce the number of gun deaths (accidental and deliberate) in the USA.

Another thought I had about this recently: In some senses, gun ownership is already de facto illegal (and has incredibly harsh penalties), even if it is technically legal. If a police officer thinks you have a gun and feels threatened by you, he can shoot you dead (effectively acting as judge, jury and executioner for the "crime" of gun ownership) and suffer no legal consequences. This is a risk for all gun owners (and, to a lesser extent, gun non-owners) whether or not the gun is "by the book" legal.

Rotating fractals

May 22nd, 2008

Fractals like z=z^2+c and z=z^3+c are easy to draw because they just involve complex addition and multiplication. What happens if we try to generalize these sorts of fractals to non-integer powers?

We can do this using the identities
\displaystyle z^w = (e^{\frac{\log z}{\log e}})^w = e^{w\frac{\log z}{\log e}}
\displaystyle e^{x+iy} = e^x(\cos y + i \sin y)
\displaystyle \frac{\log z}{\log e} = \frac{\log |z|}{\log e + i(\arg z + 2n\pi)}

The trouble is, we have to pick a value for n. Given a parameter \theta we can pick n such that \theta - \pi \leq \arg(x+iy) + 2n\pi < \theta + \pi (i.e. choose a branch cut of constant argument and a sheet of the Riemann surface): \displaystyle n = \lfloor\frac{1}{2}(\frac{\theta - \arg(x+iy)}{\pi} + 1)\rfloor. Then as we gradually vary \theta the fractal we get will change. For w = u+iv, v \ne 0, increasing \theta will cause z^w to "spiral in" from infinity towards the origin (v<0) or out (v>0). When v = 0, increasing \theta will have a periodic effect, which will cause the fractal to appear to "rotate" in fractally sort of ways with a period of \displaystyle \frac{2\pi}{u}.

It would be interesting to plot these as 3D images (like these with \theta on the z axis. These would look sort of helical (or conical, for v \ne 0.)

Using u < 0 poses some additional problems - the attractor isn't bound by any circle of finite radius about the origin so it's more difficult to tell when a point escapes and when it is convergent but just currently very far out. Clearly some further thought is necessary to plot such fractals accurately - perhaps there are other escape conditions which would work for these.

Diff algorithm

May 21st, 2008

Determining where the differences are between two files is a very important application in many areas of computing. I use it all the time when programming (to get an overview of my most recent changes). It's also used in file compression (so that only changed parts of big files need to be stored/transmitted) and in genetics.

The usual algorithm for diff is to find the longest common subsequence (LCS) between the two files, or equivalently, the edit script (the set of insertions and deletions you need to transform one file into the other).

However, this may not be the best algorithm for looking at differences between pieces of source code. People who edit source code don't just insert and remove text, they also move things around. This isn't handled so well by the LCS algorithm. Refinements are possible (like trying to match up adds with corresponding deletes to turn them into moves) but I think better results might be obtained by using an algorithm designed from scratch to handle moves.

I have an idea for such an algorithm. First, find the longest common substring (LCSS) between the two files (unlike a subsequence, the characters in a substring must be consecutive in the original string). Then, remove this string from both of the files, and repeat until either no characters are left or the substring is sufficiently short that it might just as well be an add and a remove rather than a move.

This can be done fairly efficiently (the LCSS can be found in O(N) time with a sufficiently sophisticated algorithm) and has a number of advantages over the standard method:

  • The algorithm can be "tuned" for the particular type of files it's working on. For example, you might tell it to prefer edits which affect fewer syntactical units in the language you're working in.
  • If your edit script is too noisy, you can increase the minimum length of found substring.

The day after I originally wrote this, Bram Cohen (of BitTorrent fame) wrote this. I haven't had a chance to play with his algorithm yet (or indeed mine), but I'd be interested to see how they compare (pun intended).

2D game

May 20th, 2008

I don't tend to have time to play many computer games these days (though I made exceptions for Portal and Lost:Via Domus), but I still like to keep abreast of how technology trends change in games.

A concept that I think is interesting in games is to "raise the stakes" as the game progresses, so that later levels seem to be more important than earlier ones. A couple of contemporary games (neither of which I have played) that embody this principle are Katamari Damacy and Spore. As the game progresses, the very scale of the action increases, causing the early parts of the game to pale in comparison.

3D graphics have become much more impressive in the past few years but the trouble with 3D is that the extra degrees of freedom make the controls more complicated and difficult to learn (especially for people who don't play many 3D games).

I think it would be interesting to apply the "increasing scale" concept to a 2D game using modern graphical techniques. My idea is a 2D-scrolling shoot-em-up. You start off with a very small, simple "asteroids" spaceship with a very simple weapon. As you blow things up, you can collect more weapons which make your spaceship bigger. Using these bigger weapons, you can attack bigger and bigger targets. The view gradually "zooms out" as the size of the ship, weapons and targets increases.

This idea also reminds me of the Seven Force in Gunstar Heroes on the Megadrive. Easy parts of this boss appeared in an early level and then in a later level there was a "scaled out" version which was much harder and with smaller sprites.

And with modern technology, 2D games can be seriously beautiful. Aquaria does actually zoom in and out a bit, but not by orders of magnitude as I'm imagining.

Devolving the CPU

May 19th, 2008

Until fairly recently, most computers could only do one thing at a time. To put it in technical terms, they had only one "execution unit" to take instructions from memory and actually act upon them. Computers have been able to apparently run many programs at once for much longer than that, through sleight of hand (switching the execution unit between different programs many times per second - so fast that you can't tell they're not executing simultaneously).

But in recent years, CPU manufacturers have been finding it more and more difficult to make execution units faster. CPU speeds increased exponentially until a few years ago, but the fastest CPUs you can buy have been about 3.6GHz for several years now.

But Moore's law (which says that the number of transistors you can put in a single chip doubles every 18 months) has continued to hold. So what do we do with these extra transistors, if we're not using them to make the execution unit faster?

Well, they're instead being used to put multiple execution units on a single CPU, so that computers really can execute multiple programs at the same time, without switching between them. This has the effect of making computers faster (since these days they're rarely doing only one thing at a time anyway, or if they are the work can often be divided amongst execution units to effectively increase the speed).

This means that programmers need to change the way they write programs, to obtain maximum speed from the latest machines. They can no longer write a program as a single list of instructions (or a single "thread of execution" to use the technical term) and expect it to run 1000 times as fast in 15 years time. Instead, programmers must write programs that use multiple threads and divide the work between them. This "multi-threaded" style of programming is more difficult and (if you don't know what you're doing) tricky errors can occur (for example, some errors only happen if two particular parts of the program happen to be executed at the same time, so can be very difficult to reproduce).

Once we've got the hang of doing multi-threaded programming properly, having any one execution unit go really fast won't be so important, because programs will be able to go just as fast by dividing up their work between execution units. This means that we will see a redistribution of transistors on future CPUs - we will go from a few very fast but complex execution units to many slower but simpler execution units, because we can make CPUs with better overall throughput that way.

There are several ways to simplify an execution unit. You can get rid of speed features like pipelining, branch prediction, out-of-order execution and speculation. Removing these gives you another advantage - it becomes much easier to determine how fast a thread will execute, which means less optimization and tuning is needed.

Another way is to simplify the instruction set (the operation codes that the execution unit actually interprets). Most currently used instruction sets are very complicated (in order to squeeze as much power as possible from a single execution unit) so there is great scope for simplification there. All the single instruction, multiple data instructions can be removed (the parallelism they enable can be more easily realized with multiple execution units).

Another simplification would be to remove all the registers and use a pure stack machine architecture. A few processors that use this technique have been developed in the past, but they never took off as they were too slow. This may cease to be a consideration in highly parallel architectures.

Stack machines have a number of advantages besides simplicity. Compilers can also be made much simpler (the parse tree can be translated into code almost directly, with no register allocation or complicated instruction selection algorithms needed). The CPU stores less non-memory state, making the operation to switch an execution unit from one thread to another much quicker. This state could be as simple as an instruction pointer, a stack pointer and some context to identify the thread.

Stack machines can also have much higher code densities than register machines, because you don't need to use any bits to specify which registers to work on - all instructions work implicitly on the top of the stack. Code density will continue to be important as the bus between CPU and RAM is a bottleneck that will only get worse as the number of execution units increase.

Another technique that can improve code density is bit-aligned instructions. By making assigning short opcodes to the most commonly used instructions and longer opcodes to the rarer instructions, you effectively apply Huffman compression to the instruction stream. The Intel 432 used this technique (and was very slow, but as the shifting that is required to decode such an instruction stream is pretty easy to implement in hardware I think this wouldn't be a big issue for a massively parallel CPU).

With a stack machine, certain patterns of instructions tend to occur over and over again. These can be assigned shorter codes (as is done in many popular lossless compression techniques like ZIP files).

Working out the optimal coding for this instruction set would be fairly simple in principle. First design a minimal instruction set for a stack based language without regard to code density (maybe just use one byte per opcode). Compile a large quanity of software to this architecture (spanning the range of possible uses - an operating system, application suite, development tools, games, emulators, number crunching, scientific visualization etc.). Next, determine the instruction frequencies across all these codebases and apply Huffman coding to figure out the optimal opcodes. With some fine-tuning, I think this could give significantly higher code densities than any of the architectures currently in use.

Another advantage of having many execution units is that transistors can be assigned to different tasks in the most efficient possible way. For example, if you have one ALU per execution unit and it is only in use 70% of the time, you could instead have 7 ALUs per 10 execution units - the ultimate in superscalar design.