Archive for the ‘computer’ Category

Commandments of text editing

Thursday, October 1st, 2009

As a programmer, one of the main user interfaces I work with day-to-day is a text editor. I find this interface to be extremely natural and efficient, when it's done right. A text editor isn't the same thing as a wordprocessor - text is all monospaced and there are no fonts or embellishments like bold, italic and underline. Colour may be present for syntax highlighting purposes, but you can't set the colour of individual characters. An important aspect of a text editor is that what you see should be exactly what you get - there should be no "hidden" information. For this reason I eschew spaces at the ends of lines and tab characters (though unfortunately documents with such things sometimes have to be handled for compatibility reasons).

I think that a find that a text editor is best thought of as a (very large) 2D grid, with one character in each cell of the grid. Moving around this grid should be predictable - the left arrow key should always move the cursor one space to the left (unless it's at the beginning of the line, in which case it should do nothing). Similarly for the other arrow keys. Home should always move the cursor to column 1 of the current line, End should move it to the cell after the last character in the current line. If you move it the cursor from the end of a long line up or down to a shorter line, the cursor shouldn't jump to the end of the line, it should remain in the same column, and spaces should be inserted as necessary if you type in such non-existent locations.

To get around the document more quickly, the PgUp and PgDn keys should scroll the document up and down a screenful at a time (leaving the cursor in the same location on the screen). Pressing the arrow keys with Ctrl held down should move the cursor a predictable amount (say 5 characters/lines at a time). Many editors take Ctrl+Left and Ctrl+Right to mean "move left a word" and "move right a word" but I think consistency in number of characters moved is more useful.

Ctrl+PgUp and Ctrl+PgDn should take one to the beginning or end of the document respectively.

Moving through the document while keeping the cursor in the same place is useful too - let's use the Alt key for this. If Alt is held down the same move is performed relative to the document but the cursor is kept in the same place on the screen.

The shift key should be used to select text (i.e. if you move the cursor with the shift key held down, the text from the point where you pressed shift to the current cursor location should be selected. One should also be able to use the mouse for selecting text (though it's rare that I find myself doing this in a good text editor). Selecting rectangular areas of text is also occasionally useful, though sufficiently rare not to need a modifier key dedicated to it. Typing in a selection should delete it, as should the Del key when text is selected. When text is not selected, Del should delete the character or line break to the right of the cursor.

It's very important that all the CUA keyboard shortcuts work:
Ctrl+C for copy
Ctrl+X for cut
Ctrl+V for paste
Ctrl+Z for undo
Ctrl+Y for redo
Ctrl+S for save
Ctrl+A for select entire document
Ctrl+F for search

The insert key should toggle insert mode (very useful when working with fixed-width data).

A few other things I miss if they aren't there: F3 to load a new file. Alt+W to save the current selection as a new file. Alt+R to read a file and insert it into the document at the cursor position. Esc to switch to a menu for accessing less often used functions such as macros and window splitting. Alt+F6 for switching to the next file in the ring.

The text editor I have found that follows most of these ideals is TSE, and it's always one of the first things I install on a new machine. With some tweaking I could probably get it even closer to my ideal. Unfortunately it doesn't run well on Linux at the moment. For this reason (and others) one of these days I may write my own text editor to replace it.

Why is TV static in black and white?

Sunday, August 16th, 2009

Have you ever looked at the static that appears on a TV screen when it isn't tuned into anything? If so, you might have noticed that it's in black and white. That fact always used to puzzle me - the patterns are random so surely all the colours that can appear on the TV should be equally likely, right?

It wasn't until fairly recently that I learned why this is. Colour TV signals are a little bit different than black-and-white TV signals - a certain frequency band within the signal is used to transmit colour information. That band corresponds to high frequency horizontal detail (patterns about 1/200th of the width of the screen). In a colour TV signal, those details are elided and the information used to carry hue and saturation information instead.

However, if you're watching a black and white programme you can get a sharper picture by using those frequencies for horizontal detail. So colour TV sets were designed to have two "modes" - colour mode and black-and-white mode. A "colour burst" signal is broadcast in an otherwise unused part of the signal which has the dual purposes of signalling that colour information is available, and calibrating the correct hue phase offset (the "colour burst" signal, if it were on screen and within gamut, would be a very dark olive green colour). This signal has to be present for about half a field before the TV will switch to colour mode. This is an imperceptably short time but stops the TV flickering in and out of colour mode if the signal is marginal.

Having a signal of the correct frequency at the correct time for that period of time is extremely unlikely to occur by chance (and even if it did, it would disappear again before you had the chance to notice it). So when the TV is showing static, it thinks it's showing an old black-and-white movie and turns off the colour interpretation circuitry, leading to black-and-white static.

Component pattern

Thursday, August 13th, 2009

Sometimes, your object hierarchy is determined by the kinds of data you're throwing around - if you have a pile of pieces of information about each customer in a database, it's obvious that you should have a Customer object. But if this is the only rule you use for deciding when to make a class, you'll often end up with enormous classes. This tends to happen for the object representing the program itself. A program tends to have a lot of things that there are only one of, and if you stick all these things in one big object then its unwieldy.

So you break up your Singleton object into components. You can do this in whatever way makes most logical sense for your program - perhaps a set of routines and data for allocating memory go in one object, and a set of routines and data for dealing with the program's main window go in another. Then you need a master object responsible for wiring all these components together. The construction of this object may be somewhat non-trivial:

In the master constructor, each of the components is constructed. However, we don't tell the components about the master yet because it's constructor is incomplete.

Once the constructor is complete, we need to loop through all the components again and "site them" (give them a pointer to the master that they can use to obtain pointers to other components they need).

Depending on how complex the interdependencies between components are, more initialization steps may be needed. The order of the components may also make a difference.

This doesn't have to be used for just the singleton - any sufficiently complex object can benefit from being broken up this way.

I looked for this pattern in the literature but didn't see it anywhere. However, it seems to be a reasonably common thing to do so I'm recording it here.

CGA Hydra

Tuesday, August 11th, 2009

A while ago, Trixter challenged me to figure out if it was possible for a CGA card with both composite and RGB monitors attached to it to display a different image on each display. At first I thought this was impossible because the composite output is just a transformation of the RGB output - the RGB output contains all the information that the composite output contains.

But that reasoning only works if you're close up. If you stand back sufficiently far from the screens, adjacent pixels will blur into each other so this is no longer necessarily true. Suppose we have a pattern that repeats every 4 high-resolution pixels (or half an 80-column character, or 1/160th of the screen width, or one colour carrier cycle) and we stand sufficiently far back that this looks like a solid colour. On the RGB monitor this will just be an average of the 4 colours making up the pattern. So, for example, black-black-white-black and white-black-black-black will look the same on the RGB monitor, but they will look different on the composite monitor because these two patterns have different phases with respect to the color carrier, so they will have different hues.

That explains how we can get details on the composite monitor but not on the RGB monitor, but what about the other way around? This is a bit more complicated, because it requires knowing some more details about how the CGA generates (non-artifact) colour on the composite output. For each of the 8 basic colours (black, blue, green, cyan, red, magenta, yellow and white) there is a different waveform generated on the card. The waveform for the current beam colour is sent to the composite output. The waveforms for black and white are just constant high and low pulses, but the waveforms for the 6 saturated colours are all square waves of the colour carrier frequency at different phases. The green and magenta lines switch between high and low on pixel boundaries, the other 4 at half-pixel boundaries (determined by the colour adjust trimpot on the motherboard).

What this means is that if you're displaying a green and black or magenta and black image, the pixels are essentially ANDed with this square wave. The pixels corresponding to the low parts of these waves have no effect on the composite output. So you can use these pixels to make the image on the RGB monitor lighter or darker whilst having no effect on the composite image.

Here's what the finished result is supposed to look like (with another image on an MDA display as well):

Note that I've allowed the composite image to show through on the RGB monitor a little in order to improve contrast.

Floating point is not evil

Monday, August 10th, 2009

In response to this:

Floating point is not evil and it is deterministic, but you need to know what the values you're working with actually are. Basically, floating point is like scientific notation, except with 2s instead of 10s. In other words, instead storing numbers like 1.234*104 it stores numbers like 1.00110100102*210. It's actually stored as a pair of numbers, a mantissa and an exponent. The number of bits in the mantissa is fixed (it's 23 bits for single precision and 52 for double) and the leading "1" is implied. Each number representable as an IEEE floating point constant has exactly one representation except for zero (+0 and -0 have different bit patterns for complicated reasons). There are complications (denormals, NaNs and infinities) which can usually be ignored and which I won't go into.

Floating point numbers are handy for lots of purposes but they do have a couple of problems.

The first problem is that floating point numbers are inefficient. They are very quick with today's hardware but consider what you'd do if neither floating point hardware nor floating point libraries were available. For most applications, you'd use fixed point numbers - you'd store an integer and it would be implied by the type of number you're working with that the actual numeric value is obtained by dividing this integer value by 2n for some n. For most purposes you probably wouldn't store that n with each integer - all your numbers have the same number of significant digits. For example, if you're writing a graphics program you might decide that units of 1/256 of a pixel width are always enough, so n would always be -8. When writing floating-point programs, most programmers don't do this calculation to figure out what the precision needs to be, they just use single precision floating point or switch to double if that isn't precise enough. While constant precision is preferable for a general purpose calculator, most actual applications are better served by constant resolution.

The other problem is that sooner or later you'll run out of precision. If you're plotting Mandelbrot sets, sooner or later you'll zoom in far enough that adjacent pixels have complex numbers with the same floating-point representation. If you're using FFTs to multiply big integers, sooner or later you'll want to multiply integers so large that floating-point numbers won't have sufficient precision. If you're using hardware floating point, this is quite difficult to solve (you need to find or write a big-float library) and will cause a big speed hit, so most people will give up at that point. However, if you're already using fixed point bignums, it's just a question of adding another digit.

378Kb disk hack for Fractint 15.1

Wednesday, August 5th, 2009

I first heard of Fractint soon after I got interested in Fractals. I knew it was very fast and had lots of fractal types and features and I wanted to run it on the PC1512 (the only PC I had regular access to at the time). I bought a "Computer shopper" magazine which had a copy of Fractint 15.1 on the cover disk. Only the 3.5" disk was available, so I had to get a friend who had both drives to copy it onto 5.25" disks for me. The disk contained a 323Kb self-extracting ZIP file which, when ran, produced (amongst other files) FRACTINT.EXE which was a 384Kb file. This would not have been a problem except that the machine only had 360Kb disks - FRACTINT.EXE wouldn't fit!

Somewhere (probably on another magazine disk) I found a program that was able to format disks to non-standard formats by calling BIOS functions (or possibly by programming the drive controller directly, I don't remember). It turns out that the drives in the PC1512 were actually able to format 42 tracks before the head bumped into the stop, not just the standard 40. This gave me a 378Kb disk, still not quite enough. I discovered that if I requested the program to format a 44-track disk, it would fail but record in the boot sector that the disk was 396Kb. With this format, I was able to persuade the ZIP program to try extracting FRACTINT.EXE. Of course, it failed to write the last few Kb but it turned out that because Fractint used overlays, it still worked - only one or two of the overlays were inaccessible (I think printing and shelling to DOS were broken, and I didn't miss them).

I made a Mandelbrot zoom video with this setup. It was on the point +i (zooming into a more detailed area would have taken too much time and disk space). I played the video back by converting the images to PCX files (which are much quicker to decode than GIFs) copying them to a ramdrive and then using PICEM to play them back at about 5fps.

Bugs are good

Monday, August 3rd, 2009

This may strike horror into the hearts of the test-driven-development crowd, but it makes me nervous when I write a program and it works correctly the first time. Sure it works, but maybe the code is subtly broken in some way and it only works by accident. Maybe if you sneeze too loudly nearby it will blow up. Maybe if I add some feature it will blow up. Until I've given it a decent workout there's just no way of knowing.

But a program with a bug is a puzzle - solving the puzzle gives one purpose, and an opportunity to learn more about how the code works (or why it doesn't). Sometimes in the process of debugging I'll change parameters or add debugging code, exploring the design space in order to get a better idea about what's going on. Sometimes doing this will give me ideas for improvements unrelated to fixing a particular bug - a way to improve performance or a new feature to add.

I have much greater confidence in a program that I've fixed lots of bugs in than one that works first time.

Installing Vista the difficult way

Saturday, August 1st, 2009

I recently swapped the 100Gb hard drive that my laptop came with for a 320Gb model. I kept running out of space and deleting operating system components that I wasn't using, which is all very well except that when I tried to install Vista SP2 it complained that important files were missing. The SP2 installer doesn't really need those files either, it just does an integrity check.

After having installed the new drive I started to install Windows and then discovered that my laptop's DVD drive was in bad shape. It could read CDs fine but not DVDs at all.

I had read that it was possible to install Vista from a USB key so I tried this. Unfortunately the only USB key I have handy is a 1Gb one and Vista claims it needs a 4Gb one to install from. There doesn't appear to be anywhere close by that sells USB keys.

I wondered if it was possible to boot from the USB key, copy the required files to the hard drive over the network using the preinstallation environment and then install from the hard drive. With a bit of fiddling I got it to work, and here is how I did it.

  1. Format the USB key to NTFS, copy the files over but excluding install.wim. This only takes about 256Mb. Make the key bootable with "bootsect /nt60" and boot from it.
  2. Choose "repair" at the computer and open a command prompt. Use diskpart to partition and format the drive.
  3. Now we need to get the installation files onto the hard drive, including install.wim. We can get TCP/IP networking up and running with netsh but the Vista preinstallation environment doesn't include SMB file sharing. It does however include the ftp client, so we can set up a small FTP server another machine and then ftp install.wim over. The rest of the files can just be copied from the USB key.
  4. The next part is tricky - we need to convince the installer to run from the hard drive and also install to the hard drive. If we make the hard drive bootable the same way we made the USB key bootable and try to boot from it, the installer gets very confused and gets stuck in a loop after a couple of reboots. However, I hit upon another way which does work. Unmount the USB key and the hard drive with "mountvol /d" and then mount the hard drive with the same drive letter that was being used for the USB key. Then run "setup" and install Vista as normal.

Addendum: I tried to do this again with a new computer I built, but the Vista preinstallation environment didn't have drivers for my network card. Obviously if this is the case, another method must be used.

Complexity metric for identifiers

Friday, July 31st, 2009

As far as I know, every code complexity metric ever divised treats all identifiers equally. After all, they're all just opaque strings to the computer, right?

But complexity metrics are not designed for the benefit of computers - they're designed for people, to try to make the code less complex and easier for people to understand (otherwise we could just measure the length of the generated binary code).

And there is certainly something less complex about a variable called "cost" than a variable called "amountPaidPerItemInDollars" for example - a program using the second should surely be considered more complex than an otherwise identical program using the first, right?

On the other hand, one doesn't necessarily want to count all the letters in an identifier to measure its complexity - that would just lead to very cryptic or meaningless 1 and 2 letter variable names.

I think the answer is to divide identifiers up into words (in case sensitive languages by starting each word with a capital letter, and in non-case-sensitive languages by separating words with underscores). Each word counts for one point and should be a real word in the English language, or defined elsewhere in the program as a sequence of other words (perhaps with a special comment that the compexlity measurer can understand). So, for example, instead of having a lot of variables containing the characters hyperTextMarkupLanguage, one would have a glossary section in one's program saying "html == hyper text markup language" and then treat "html" itself as a word.

Making up terminology is an important part of programming, and one that I think is often overlooked. Giving a decent (appropriate, one word) name to each of the important concepts in your program from the get-go, and giving each of these terms a precise meaning (even if some details of those meanings change as the program evolves) causes one to be able to think about these concepts more clearly. It also leads to easier and more consistent naming of variables and types.

Manifesto for a new demoscene

Thursday, July 30th, 2009

Since computers become powerful to decode compressed audio and video in real-time, and storage capacities and bandwidths have increased to make storage and transmission of such streams practical, the demoscene has become even more niche than it used to be.

Nowadays, most demos are more about art than cutting edge technology. The exceptions tend to be demos that are constrained in one way or another, for example by being written for old hardware, or limited in size.

I'd like to propose a new kind of constrained demo - one that isn't limited to a particular size, but which is completely procedurally generated. Such techniques have always been used in demos (particularly constrained ones) but generally combined with other techniques. It would be interesting to see what is possible in demos that have no data that is temporally or spatially indexed or was generated by transforming temporally or spatially indexed data. That means no waveforms (all sounds must be synthesized), bitmaps, JPEGs, MP3s or video streams. Vector graphics are allowed, but only if they are hand-drawn (so you can't generate a vector image by automatically stenciling a bitmap image of the Mona Lisa, for example). If you want the Mona Lisa in your demo you have to draw the vectors yourself. Bitmap art could be done in the same way - rather than by storing the finished bitmap in the demo binary, one would have to store the sequence of commands the artist used to draw the image in whatever graphics program (which could then be rendered to a bitmap at startup time).