Archive for the ‘computer’ Category

What I want from an HDR workflow

Tuesday, June 3rd, 2008

Once I've got my HDR camera and my HDR monitor, I'll need new photographic workflow applications to get the images looking the way I want them. I expect that there will be a few parameters that I'll almost always want to tweak, much as I almost always re-crop my photos at the moment. These parameters are likely to be:

  • Colour balance (2D slider)
  • Exposure (slider)
  • Dynamic range compression (slider)
  • Tone mapping radius (slider)

The last two of these reproduce the functionality of current HDR applications, allowing creation of tone-mapped images for non-HDR output (like printing, or legacy monitors).

The high dynamic range revolution

Monday, June 2nd, 2008

Currently some people are making beautiful HDR images like these. This takes an input image with a high dynamic range (often composed of multiple exposures with different exposure times to get good colour resolution over a wide range of brightnesses) and "compresses" the range down to monitor or printout ranges. This can give an effect similar to an oil painting (painters use similar techniques).

But such techniques will soon become unnecessary as the dynamic range that monitors can display increases. As I've mentioned before I've seen this technology in action and it's seriously impressive - the pictures are incredibly realistic, like looking out of a window. As these monitors drop in price they will become ubiquitous and then we will want to take pictures that take full advantage of them.

Shooting RAW with a good digital SLR goes some way towards this, but I think that with the new generation of monitors will come a new generation of cameras optimized for taking HDR images. This might be as simple as reading the sensor several times over the course of the exposure, or it might be a completely new sensor design.

With new monitors and new cameras, the entire graphics pipeline will be re-engineered for HDR.

Photographic workflow

Sunday, June 1st, 2008

The workflow that I use for the photographs on my website has remained pretty much unchanged for many years.

  1. Copy the photos from the card to the computer and then delete them from the card.
  2. Open the folder of photos in ACDSee and delete any obvious duds.
  3. Open all the photos in Paint Shop Pro 4 (yes, I know it's ancient but it works well, I know my way around all the tools and it's fast).
  4. I look for similar photos and close the ones that are redundant or unattractive, eventually whittling it down to the set of photos that will form a nice album page.
  5. I rotate (sometimes by arbitrary angles) and crop. Sometimes I'll adjust brightness and/or contrast to save a poor photo if there's something in particular that I want to have a picture of. Sometimes I'll use a more sophistical program like Photoshop to remove redeye or do other colour manipulations.
  6. Very occasionally I will use the clone tool to erase something that I don't want in the photo.
  7. I'll resample the photos to the appropriate size and save them as jpgs.
  8. Finally I'll manipulate the directory listing in a text editor to create the html file, add captions and upload the lot.

Someday I'll trade in my trusty Olympus C3000Z and get a nice digital SLR. But I might wait a few years because the high dynamic range revolution is coming. More about that tomorrow.

Compile-time symbolic computations

Saturday, May 31st, 2008

A compiler ought to be able to perform compuations at compile time. Unlike the generated code, these computations don't have to be blazingly fast (since they only happen at compile time), and don't have to conform to any particular machine's architecture (since the language should be the same for different architectures anyway) so some nice things can be done.

Arbirary-precision integer arithmetic is a nice easy one and one that is done by many scripting languages already. This allows you to write expressions such as:

const int a = 1000000000000/1000;

and have a initialized to one billion even on 32-bit machines.

Rational arithmetic is also easy and useful. This allows one to write:

const int a = (1000/3)*3;

and have a initialized to exactly 1000 even on machines lacking any sort of floating point facility.

Then there are closed forms such as:

const int a = sqrt(1000)^2;  // a==1000, not 961

One could even make a compiler understand pi as a symbolic constant of type "Real" and evaluate it to the appropriate number of significant figures for the expression and type that is being initialized. So:

const int a = truncate_cast<int>(pi*10000);

would initialize a to 31416.

Once these things are in place, a compiler can perform quite sophisticated mathematical transformations to allow programmers to write what they really mean and still obtain optimal code. Even to the level of Maple/Mathematica if the compiler writers are sophisticated enough.

Compile time metalanguages

Friday, May 30th, 2008

The C programming language actually contains two different languages - C itself (the code for which runs at run-time) and the preprocessor language which runs at compile time and has completely different syntax. It's also not very powerful - often programmers resort to writing programs which generate C code as output instead.

It would be really nice to be able to make both these approaches unnecesary, and have a compile time language that uses the same syntax as the run-time language. A compiler ought to be able to detect parts of the program which run on constant data, run them during compilation and emit the output directly into the generated binary.

Replacing #ifdefs is easy - just use the normal "if" and ensure that your compiler will optimize away false clauses for constant expressions. GCC does this and indeed the GNU coding standards recommend this.

Replacing code that performs complex transformations on data is more complicated, but not intractably so as long as the compiler can reliably determine that all the functions used in these transformations are pure.

Lexical scoping and closures

Thursday, May 29th, 2008

A closure is a just a piece of code with some context - data on which it works. Such things are common in the functional programming community but were left out of the C language because they're actually quite difficult to implement properly.

The C++ language has sufficiently powerful constructs that one can create closures, but the syntax is very unwieldy. One must write:

class Totalizer
{
    int _total;
public:
    Totalizer() : _total(0) { }
    void operator(int x) { _total += x; }
    int total() const { return _total; }
};
 
int total(Vector<int> x)
{
    Totalizer t;
    for_each(x.begin(), x.end(), t);
    return t.total();
}

where one would really like to write something like:

int total(Vector<int> x)
{
    int t;
    for_each (x, lambda(int y) {t += y;} );
    return t;
}

Here the code is the "{t += y;}" bit and the context is the "t" variable, which is local to the "total" function.

C# 3.0 provides functionality like this.

What's really going on here is that an object is being created which encapulates some subset of the local variables (the "t" variable in this case) and which is passed to the for_each function (just like in the C++ version). Closures and objects are really just different syntaxes for the same thing.

If we want to return a closure from a function we can do so, but we must then allocate the object in such a way that the "t" variable will still be valid when the function exits. This is where things get really fiddly. This is one of those things that is easier in a language with garbage collection (you can just allocate the closure on the heap and the GC will take care of cleaning it up when it's finished with) but with care I think it should be possible to do it in a non-GC language as well.

Stringy identifiers

Wednesday, May 28th, 2008

In most computer languages, identifiers can only contain alphanumeric characters (plus, often, the underscore '_') and can't start with a number. There are very good reasons for this, but there are occasions when it would be desirable to be able to use arbitrary strings as identifiers. For example, suppose that we have a language that has a parser generator built in. Suppose also that we give it a parsing rule like this:

WhileStatement := ("while" | "until") "(" Expression ")" Statement

This is really just another way of writing a parsing class:

class WhileStatement { ... }

We would like the members of this class to be automatically generated by the parser generator in a predictable way. With string-identifiers, the generated class might look something like this:

  class WhileStatement
  {
    Enum { `"while"`, `"until"` } `"while" | "until"`;
    Expression expression;
    Statement statement;
  }

i.e. the name of the member variable which holds the piece of data about whether this is a "while" loop or an "until" loop is called `"while" | "until"`. The type of this member is

Enum { `"while"`, `"until"` }

which means that it can take only the values `"while"` and `"until"`.

This class can be used as follows:

  WhileStatement whileStatement = WhileStatement::Parse(file);
  if (whileStatement.`"while" | "until"` == `"while"`)
    ...

This reads very naturally, and avoids the need for anyone (human or compiler) to make up any variable names. It's good to avoid making the programmer make up names for things when not absolutely necessary.

Random numbers

Tuesday, May 27th, 2008

A commenter asked how random programs like random song shuffle work. How can a computer (which follows the same instructions each time) have unpredictable behavior?

Well, to understand this, first think about how random numbers are generated in the "real world" outside of computers. One way to do this is to throw dice - as long as you don't do something tricky, most people would agree that that's a perfectly reasonable source of randomness.

Except that if you really think about it, it isn't really random at all. In principle, one could make a machine that throws dice exactly the same way each time and always get a double six. You'd have to isolate it from stray air currents, minor gravitational influences from passing traffic and so on (as needs to be done for many sensitive scientific experiments). If you think about it, the randomness you get from throwing dice is really just due to these kinds of factors (and the fact that we human beings can't control our muscles finely enough to exert precisely the same forces each time). So really, what you're doing when you're throwing dice is "amplifying" these unpredictable factors (air currents, position, throwing force etc.) so that any tiny changes in them result in an entirely different number coming up.

Computers generate random numbers the same way. Well, sort of - they don't simulate the physics of three-dimensional dice being buffeted by air currents and bouncing off things - they do something that doesn't require such a complicated program but the same principle is at work. One reasonably common one (not the best, but fine for uses such as games) is to multiply by 22695477, add one and then take the remainder after divison by 4294967296. If I give you the sequence 953210035, 3724055312, 1961185873 you'd be hard pushed to tell me that those numbers were generated by that formula starting from the number 42 (maybe not quite so hard pushed as telling me the precise forces acting on dice given the numbers that came up but you get the idea). Other random number generators use the same idea but with more complex formulae to make the patterns even more obscure.

The problem with this method is that (as with any computer program) given the same input you're going to get the same output every time. And indeed this was a problem with some home computers and calculators in the 80s - if you tried to use the random number generator right after turning the machine on you'd get the same "random" numbers each time. The solution is to "seed" the random number generator somehow. The usual method is to use the current time. While it's predictable in some sense, you will get a different random sequence every time in practice so for most purposes you will never notice a pattern.

For some other purposes (like generating cryptographic keys or passwords, which have to be unpredictable even to people who are trying very hard to predict them) just using the time isn't enough and you have to mix in other sources of unpredictability (or "entropy" to use the technical term). This is why some SSH clients ask you to bash on the keyboard or wiggle the mouse the first time they are run - someone who wants to intercept your communications would have to know exactly which keys you pressed and how you wiggled the mouse (with extremely high precision). This is very easy to get wrong (and difficult to know when you've got it wrong) and can lead to security holes. Hence the old saw that "generation of random numbers is too important to be left to chance"! Some computers even have hardware built in which generates random numbers from unpredictable physical processes like thermal noise or quantum wavefunction collapse.

Once you've got a sequence of (pseudo-)random numbers evenly distributed over some range (for example 0 to 4294967295 in the above example) there are a variety of techniques to massage these into various different forms for different purposes. If you want an integer in the range 1 to 10 you can find the remainder after division by 10 and then add 1. If you want a random real number in the interval 0 to 1 you can divide by 4294967295. If you want numbers from a Gaussian distribution you can use a Box-Muller transform. And if you want to shuffle songs in your playlist you can start with an empty playlist and add a random song to it (removing it from the original playlist each time) until the original playlist is empty.

This last algorithm has a flaw in some sense (though it's more of a flaw in our brains). While all possible orders are equally likely, it will tend to play two songs in a row by the same artist more often than you would expect (just look at the bug database for any open source media player application to see the evidence for this). The problem isn't that the numbers aren't random enough, it's just that that if you have n artists in your playlist you'll see this phenomenon once every n songs on average. The problem is that we tend to notice this when it happens, and because the times when it happens stand out more than the times when it doesn't the problem seems to be worse than it really is. Another factor is that we are used to radio stations, which deliberately avoid doing this by hand-picking the order of songs rather than using a random shuffle. Some media player applications have appeased the complainers by actually making their shuffle features *less* random - adjusting the algorithm so it avoids picking orders which have two songs by the same artist in a row. This seems hacky to me, but if users like it I suppose I can't disagree with that.

Object relational mismatch

Monday, May 26th, 2008

I don't have a lot of experience with SQL databases. Most of the programming I have done is object oriented. It's a little odd to hear about the "object relation impedence mismatch" because as far as I can tell the two paradigms complement each other quite nicely.

Object oriented programming creates complex things by sticking simple things together. So you might have a Customer object which can have any number of Order records. The Order records associated with a particular customer are easily accessible to code running in the context of a particular Customer.

Relational programming slices things up the other way. There is a table of Customers and (completely separately) a table of Orders each of which points back to a particular Customer. All the Orders are stored together no matter which Customer did the ordering. This has the advantage that you can easily look through the Orders for (e.g. every Order of a particular widget, without looking through all the Customerrs first.

The downside is that relational programmers spend lots of time writing things like "SELECT o FROM Orders where o.customer==customer" to retrieve the data that would just "be there" in context, if you were using objects.

It seems natural to try to combine these two styles - to persist objects to long-term storage by slicing them up into normal form components which are then stored in tables representing different classe, and to automatically generate queries for interesting chunks of related data and construct objects based on them. Then you could generate different object structures for different problems (for example, your Order objects might be organized by widget instead of by customer for stock-keeping purposes.

I know some things like this have been done but it seems such a clearly useful thing to do that I'm surprised it isn't completely ubiquitous by now. Why don't OSes come with a transactional databases as basic system services? Is it just that the object people and the relational people don't talk as much as they should? It would be really cool if a compiler stored its syntax tree in such a way that I could make a list of (for example) the set of "if" statments in a program being compiled.

Mixins in C++

Sunday, May 25th, 2008

A desirable property of C++ class libraries is to be able to leave the inheritance hierarchy up to the user of the library. For example, suppose you are writing a UI framework. You have a "Window" class - every window in the system is represented by an object of some subclass of Window. Windows can be animated, scrollable or dockable. Each of these attributes has some data and methods associated with it, so naturally forms a class. But any combination of these attributes also forms a class. So to be fully general your library would have to provide 2n possible classes - clearly not very practical.

A solution is to provide each attribute as a Mixin (the name was inspired by an ice cream parlour near MIT which allowed you to mix in extra items like crumbled cookies and jelly babies into your base ice cream flavour). A Mixin is a template class parameterized on its superclass, like so:

template<class Base> class Animated : public Base { ... }

Then you can specify a class hierarchy like:

Animated -> Scrollable -> Dockable -> Window

as the template class:

Animated<Scrollable<Dockable<Window> > >

The trouble with this is: how do we construct an object of such a type? An object's constructor needs to know how to construct its base class, which means that you'd still need to write 2n constructors. I found a paper which provides a rather convoluted solution using all sorts of macros and template programming wizardry, but I thought of a much simpler way.

All you need to do is have some sort of standard for constructing an object. For example:

template<class Base> class Animated : public Base
{
public:
    class Params
    {
        friend class Animated;
    public:
        Params(typename Base::Params bp, ...) : _bp(bp) { ... }
    private:
        typename Base::Params& _bp;
    }
    Animated(Params p) : Base(p._bp) { }
    ...
}

And similar subclasses (with the same name) for all the other mixins and the base flavor class.

Then you can instantiate your composite class as follows:

Animated<Scrollable<Dockable<Window> > >
    window(Animated<Scrollable<Dockable<Window> > >::Params(
        Scrollable<Dockable<Windows> >::Params(
            Dockable<Window>::Params(
                Window::Params("blat"), "baz"), "bar"), "foo"));

or, if you don't like typing n2 identifiers:

Window::Params wp("blat");
typedef Dockable<Window> D;
D::Params dp(wp, "baz");
typedef Scrollable<D> S;
S::Params sp(dp, "bar");
typedef Animated<S> A;
A:Params ap(sp, "foo");
A window(ap);

You also get the advantage that your calling code says exactly which parameters belong to which mixin, improving type checking (similar to the named parameters construct described in D&E).