Archive for the ‘computer’ Category

Virtual physical tone matrix

Wednesday, July 28th, 2010

Here's a virtual version of my physical tone matrix, implemented in Flash, so you can play along at home:

Much of the code is a direct translation from the C code in the non-virtual version, so it works in a way very close to the real thing. Even the sample rate and frame rate are close (though they are not synchronized - it's not a cycle-exact simulation).

Click or drag on the screen to turn the lights on and off.

The knobs are (from top left) sustain, tempo, tuning and volume. Above and between the sustain and tempo knobs is the escape switch, which brings up or closes a menu of funcions. These are, from top left:

  • Sine wave (default)
  • Square wave
  • Triangle wave
  • Sawtooth wave
  • Random wave
  • Pattern editor (default)
  • Coarse waveform editor
  • Fine waveform editor (not particularly useful or easy to use)
  • Tuning editor
  • Miscellaneous editor
  • Save current pattern/waveform/tuning/settings to EEPROM
  • Load current pattern/waveform/tuning/settings from EEPROM
  • Toggle life mode
  • Toggle random mode
  • Toggle microtone keyboard mode

The miscellaneous editor shows 6 rectangles. Clockwise from top-right these are:

  • Patterns per loop (a 5-bit binary number) - if this is not 1, then whenever the pattern reaches the right a new pattern is loaded from EEPROM.
  • Fixed tuning - just disables the tuning knob. This exists in order to allow the tuning to be set more accurately on the non-virtual version, not really necessary on the virtual version
  • Update noise continually - when set, the waveform is continually updated with random data, so all frequencies sound the same. This is slightly different to the "random wave" waveform preset, where the 256 waveform components are initialized with random numbers but then don't change, so there are some audible differences between playing them at different frequencies.
  • Beats per pattern - repeat before the pattern gets all the way to the right, useful for rhythms that don't factor into 16.
  • Sustain override - overrides the function of the sustain knob so that this parameter can be set digitally.
  • Tempo override (a 16-bit binary number) - must be greater than 0x200 to work - smaller numbers are faster).

The EEPROM save/load functions load or save the data for the current editing mode (pattern, coarse waveform, fine waveform, tuning or miscellaneous). To use them, press save or load. The screen then shows the current EEPROM contents. Click on a location to choose where to save to or load from. Depending on the editing mode, 2-8 lights will be lit per "saved file".

Fun things to try:

  • Put the PTM in random mode and light 10-16 lights. Then just sit back and listen to some randomly generated music for a while.
  • Draw a glider or Lightweight spaceship, put the PTM in life mode and turn the tempo all the way up to maximum.
  • Make a horizontal line in pattern editor mode, turn the sustain all the way up so that the sound is continuous, then switch to the coarse waveform editor (which was inspired by this). Click and drag around to move the "sliders" and see what sounds you can make.
  • Make sure the pattern editor is empty and then switch to microtone keyboard mode. The lit lights correspond to the white notes on a piano keyboard - the unlit lights are the frequencies in-between (i.e. it's a 34-TET instrument). Try to play or tune or make some weird music. Trying the different preset waveforms is fun in this mode.
  • (Advanced) Try to create a different tuning such as major scale or chromatic (instead of the default pentatonic) using the tuning editor. Each line of the tuning editor represents a frequency as a 16-bit binary number (LSB on the left), in units of roughly 0.24Hz (16Mhz / 226).
  • (Advanced) Make some patterns and save them in consecutive locations in EEPROM, starting at the top left. Then use the "patterns per loop" setting in the miscellaneous editor to play them back in sequence and make a tune longer than 16 notes.

The copy and paste functions on the right-click menu work and are compatible with the original Tone Matrix applet.

Source code is here.

USB device classes and usage pages

Thursday, July 22nd, 2010

The Universal Serial Bus interface specification includes various fields to specify what kind of device is plugged into a computer, and what numbers correspond to what buttons, lights and so on on the device. All these are documented in the Human Interface Device Usage Tables. When I discovered this, it provided me with some amusement as alongside the normal mice, keyboards, scanners there are some more exotic devices. Section 5, which describes "Simulation Controls" (i.e. standardized controllers for operating various simulators). As well as the ones you'd expect like a steering wheel (Usage ID C8) and a flight yoke (Usage ID 24) there are such controls as "Standard spaceship controls" (Usage ID 04).

I had no idea that spaceship controls were standardized. I suppose it makes sense, so that once you learn how to fly one sort of spaceship you could (in principle, in some sort of emergency situation) have some chance of operaing an unfamiliar craft. Of course, if that emergency situation is being kidnapped by aliens you're probably out of luck as I doubt any alien spaceship designers have read the USB specification documents.

What's even funnier is the "Standard magic carpet controls" (Usage ID 0B). Being a magical (not to mention fictional) mode of transportation, it seems quite amazing that magic carpets have standardized control mechanisms. You'd think the interface would be something as indistinguishable-from-sufficiently-advanced-technology as "just think about where you want it to go", but the USB documents go into quite some detail about how to fly a magic carpet:

Allows a device to be generally classified as one that uses the standard control of a magic carpet. This control is a bar, grasped by both hands, that controls the Yaw, Pitch and Roll of the carpet.

The bar, at which the pilot sits, may be pushed forward or pulled back to cause the carpet to dive or rise, respectively. In the zero position, the carpet is in level flight. Pushing forward on the bar causes the carpet to nose down and generates negative values. Pulling back on the bar causes the carpet to nose up and generates positive values.

Turning the bar turns the carpet. In the zero position, the carpet travels straight ahead. Pulling back on the right side turns the carpet to the right and generates positive values. Pulling back on the left side turns the carpet to the left and generates negative values.

Rotating the bar rolls the carpet. In the zero position, the carpet travels level. Rotating the bar in a clockwise direction rolls the carpet to the right and generates positive values. Rotating the bar in the counterclockwise direction rolls the carpet to the left and generates negative values.

So there you go - now if you find yourself in a Persian golden-age fairy tale, you know how to get around.

I am a monad

Saturday, July 17th, 2010

In the programming language Haskell, it is normal to work with functions that depend only on their input values, and which do nothing except return an output value. Functional programming is generally much easier to reason about, because there's no chance of one piece of a program making a change to global state that affects an apparently unrelated part of the program.

This does leave the problem of how a function program can communicate with the outside world. Various approaches have been tried, and the one Haskell chooses is particularly elegant. Functions that need to accept input or produce output take an IO object as a parameter and produce a (generally different) IO object as the result. The IO object "encapsulates" the entire world outside the program. To actually run a Haskell program, the machine performs a sequence of IO transformations - taking one IO object, evaluating the program as much as necessary to determine the next IO object, and then actually performing the corresponding input or output operation before restarting the loop.

So there's a sort of inversion between how we normally think about function evaluation and how the evaluation actually happens. One can't really pass around the entire state of the universe as a parameter the way a C programmer would pass an int, so one must fake it by moving the "state of the universe" to the outside of the program and rearranging everything else so that it works the same way as it would if you could.

I can't prove it, but I think there's a very deep idea here with implications for how we understand the universe and our place in it. Much like pure functions can't describe IO, it seems like physics as we understand it can't describe human consciousness (in particular, subjective experience). Some suggest that this means consciousness is an illusion, but this has never been a satisfying answer to me.

Physics is done by describing the universe objectively - there are these particles at these positions and such and such a field had value x over here at this time (somewhat like values in a Haskell program). There are rules describing how the state of the universe evolves through time (somewhat like pure functions). But none of these things really seem to be able to describe what it is like to "feel" seeing the colour red (for example). Physics can describe a stream of 700nm wavelength photons hitting a retina, causing neurons to fire in particular patterns, some of which resemble the patterns that have occurred when the same retina received 700nm photons previously. These patterns then provoke other patterns which previously occurred in conjunction with the "700nm" patterns, and cause the release of small amounts of hormones. Understanding this system completely (which admittedly we don't) would allow one to (in principle) predict exactly which previous experiences would be recalled by any given stimulus, and might even allow one to predict exactly how the stimulated individual would react. But none of this seems to be able to tell us what experiencing red is actually like because we have no way to describe subjective experiences objectively.

We experience the universe entirely from a subjective point of view - through our senses. The objective model is useful because it allows us to reason, communicate and simulate but I suspect that in saying that objective reality is the real thing and subjective reality is just an illusion, we would be making a mistake and not seeing the forest for the trees.

Instead, I would like to suggest that we perform the same inversion that the Haskell compiler does. Instead of thinking of human beings as unexplained (possibly unexplainable) things within an objective universe, think of them as the IO hooks of the universe: input from free will (assuming that it exists) and output to subjective experience. This IO doesn't (necessarily) go to "another universe" - it's just a primitive, axiomatic thing that may have no more relevance to our objective universe than the implementation details of the IO monad do to a pure functional Haskell program. Experiencing life is running the program.

One important difference between the universe and a Haskell program is that a Haskell program only has one IO monad, but the universe seems to have many subjective observers in it. Having multiple IO monads would be a problem for a Haskell program because the IO monad defines how the program is actually evaluated - there's only one "real universe" for it to run in. But there's no problem having multiple non-IO monads - if the monads can't communicate with each other through the outside world (only through the program) you can have as many of them as you like. Since people can't communicate with each other except through the physical universe, there's no problem here.

Does this mean that one observer in the universe is priviliged to be the "IO monad" whilst everyone else is a p-zombie? From the point of view of that observer, it certainly seems like that is a possible way of thinking about it, but since there's no objective difference between an IO monad and a non-IO monad (as long as the monads only communicate objectively), I'm not sure the distinction is meaningful.

Bailout thoughts

Thursday, July 15th, 2010

In 2008, the US government spent a large amount of money bailing out financial organizations troubled as a result of the subprime mortgage crisis. The fiscal conservative in me thinks that maybe socializing losses isn't such a good idea, and that doing so will cause the invisible hand to engineer boom and bust cycles to siphon large amounts of public money towards private financial institutions again in the future.

On the other hand, the consensus seems to be that it has worked, things are looking up and it cost less than expected, so maybe it was the right thing to do.

I remember reading at the time about the disasterous consequences that were predicted if TARP wasn't passed - banks would fail, making it very difficult for ordinary businesses (who didn't cause the crisis) to get credit to grow and/or continue operating.

The obvious answer is to let the banks fail and to bail out these innocent businesses instead - lending to them directly instead of lending to the banks so the banks can lend to the businesses. But figuring out which businesses are good ones to lend money to is a rather complicated and difficult process in itself - one that the government isn't really set up to do. It makes perfect sense for the government to outsource that work to private institutions (i.e. banks) who were doing it anyway (and not doing altogether too bad of a job at that side of it).

On the other hand, how can we stop this happening again the future? I think the answer is to tie bailout money to the enacting of regulations designed to stop this happening again. In particular, for the current economic crisis it seems like the conflict of interest between credit rating agencies and the issuers of securities was a major cause, and I suspect that well placed regulations there could prevent similar crises.

The fundamental differences between Windows and Linux

Wednesday, July 14th, 2010

When I started to get deeper into Linux programming, I found it fascinating to learn about the design decisions that were made differently in Windows and Linux, and the history between them.

One example in particular is how dynamic linking is done. Windows dynamic link libraries (DLLs) are compiled and linked to load at a particular address in memory. When a DLL is loaded, the loader inserts the addresses of A DLL can be loaded at a different address (in the case that its preferred address is already in use), but then the loader must relocate it, which is an expensive operation and prevents the DLLs code from being used in other processes. Once loaded, however, code in a DLL is no slower than any other code on the system (calls from one module to another are slightly slower than calls within a module though, since there is a level of indirection involved).

With Linux, on the other hand, the equivalent (shared object files) are compiled to use position independent code, so can be loaded anywhere in memory without relocation. This improves process startup speed at the expense of runtime speed - because absolute addresses cannot be used, in situations where they would otherwise be used the load address must be found and added in.

Another way Linux improves startup speed at the expense of runtime speed is lazy binding of function calls. In a Linux process, a call to a shared library is not normally resolved until the first time it is called. Function pointers initially point to the resolver, and when a function is resolved that pointer is replaced with the found function pointer. This way, the loader doesn't spend any time resolving functions that are never called.

It makes perfect sense that Linux should sacrifice runtime speed for startup speed given the history of Unix. The first versions of Unix had no multithreading - each thread of execution (process) had its own memory space. So you needed to be able to start processes as quickly as possible. With the fork() system call, a new process could be created by duplicating an existing process, an operation which just involved copying some kernel structures (the program's data pages could be made copy-on-write). Because process startup was (relatively) light, and because of Unix's philosophy that a program's responsibilities should be as limited as possible (and that complex systems should be made out of a number of small programs), processes tend to proliferate on operating systems modelled on Unix to a much greater extent than Windows.

However, this does mean that a program developed with Windows in mind (as a single monolithic process) will tend to run faster on Windows than on Linux, and a program developed with Linux in mind (as many small cooperating processes continually being created and destroyed) will tend to run faster on Linux than on Windows.

Another way in which Linux and Windows differ is how they deal with low memory situations. On Linux, a system called the "OOM killer" (Out Of Memory killer) comes into play. The assumption is that if a machine is running too low on memory, some process or other has gone haywire and is using it all. The OOM killer tries to figure out which process that is (based on which processes are using a lot of memory, and which critical system processes are trusted not to go haywire) and terminates it. Unfortunately it doesn't always seem to make the right choice, and I have seen Linux machines become unstable after they run out of memory and the OOM killer kills the wrong thing.

Windows has no OOM killer - it will just keep swapping memory to disk and back until you get bored and kill the offending process yourself or reboot the machine. It's very easy to bring a Windows machine to its knees this way - just allocate more virtual address space than there is physical RAM and cycle through it, modifying each page as rapidly as possible. Everything else quickly gets swapped out, meaning that even bringing up the task manager to kill the program takes forever.

User code in kernel mode

Monday, July 12th, 2010

Most modern operating systems have a great deal of complexity in the interface between user mode and kernel mode - 1100 syscalls in Linux 2.4.17 and some 400 odd in Windows Vista. This has implications for how security (since all those calls need be hardened against possible attack) and also implications for how difficult it is to learn to program a particular platform (albeit indirectly, since the syscall interface is not usually used directly by application programs).

It would be much better if the system call interface (and the programming API on top of that) was as minimalistic as possible whilst still securely multiplexing hardware and abstrating away hardware differences.

The reason it isn't is that doing so would make an operating system extremely slow. It takes some 1000-1500 cycles to switch between user mode and kernel mode. So you can't do a syscall for each sample in a realtime-generated waveform, for example. There are many other ways that an operating system could be simplified if not for this cost. TCP, for example, is generally implemented in the kernel despite the fact that it is technically possible to implement it in userspace on top of a raw IP packet API. The reason is that network performance (especially server performance) would be greatly impacted by all the extra ring switching that would be necessary whenever a packet is sent or received.

A much simpler system API would be possible if user programs could run code in kernel mode - that way they could avoid the ring switch and all the other associated overhead. This was the way things used to be done, back in the days of DOS. Of course, back then every application you used needed to support every piece of hardware you wanted to use it with, an application crash would take down the entire system and there was no such thing as multi-user. We certainly don't want to go back to those bad old days.

Microsoft's Singularity OS (as I've mentioned before) solves this problem by requiring that all code is statically verifiable, and then runs it all in ring 0. Given how much code performance enthusiasts like I like to write highly optimized but completely unverifiable code, I think it will be a while before memory protection becomes a thing of the past.

But maybe there's a middle path involving both approaches - unverified code using the MMU and verified code running in the kernel. A user application could make a system call to upload a piece of code to the kernel (perhaps a sound synthesis engine or a custom TCP implementation) and the kernel could statically verify and compile that code.

With a suitably smart compiler, parts of the kernel could also be left in an intermediate form so that when the user code is compiled, the kernel implementations could be inlined and security checks moved outside of loops for even more speed.

Another thing that such a system would make practical is allowing applications to subvert the system calls of less trusted applications.

Optimizing by taking advantage of undefined behaviour

Sunday, July 11th, 2010

Undefined behavior doesn't sound like something you'd ever deliberately introduce into a program's codebase. Programmers generally prefer their programs' behaviors to be fully defined - nasal demons are rarely pleasant things.

However, there are sometimes good reasons for adding a statement to your program with undefined behavior. In doing so, you are essentially telling the compiler "I don't care what happens if control flow reaches this point", which allows the compiler to assume that control flow will never get to that point if doing so will allow it to better optimize the parts of the program that you do care about. So it can be an optimization hint.

A good way to use this hint is to put it in the failure path of assert statements. In checked/debug builds, assert works as normal (does the test and terminates the application with a suitable message if the test fails) but in release/retail builds assert checks the test and does something undefined if the test fails. The compiler can then skip generating the code to actually do the test (since it's allowed to assume that it will never fail) and can also assume that the same condition is false in the subsequent code, which (if it's sufficiently smart) will allow it to generate better code.

GCC has a built-in function for just this purpose: the __builtin_unreachable() function is specified to have undefined behavior if it is ever called, and is actually used in just this way. I think this is really clever, in a slightly twisted way.

What happened to DirectSound?

Saturday, July 10th, 2010

Some years ago, I was tinkering about with some sound synthesis code in Windows. This was in the pre-Vista days, and the officially recommended way of doing sound output was using DirectSound. My application was interactive, so I wanted to make it as low latency as possible without having too much overhead. I set it up with a buffer size of 40ms (3,528 bytes). I then set up IDirectSoundNotify with two pointers, one at the middle of the buffer and one at the end. Whenever one was triggered I would fill the other half of the buffer. This all worked great.

At least, it did until I came back to the code some years later, after having upgraded the OS on my laptop to Windows Vista (the hardware hadn't changed). Suddenly this code which worked great before sounded horrible - the sound was all choppy as if only half of the buffers were being played. What happened?

After some experimentation, I discovered that the jittering happened with a buffer of less than 7,056 bytes (80ms) but not with larger buffers. Armed with this evidence and a bit of pondering, I have a good theory about what happened.

The Windows audio system was drastically rewritten in Windows Vista and DirectSound was re-implemented - instead of a thin layer over the driver, it became a high level API implemented on top of the Windows Audio Session API (WASAPI). In doing so, it lost some performance (it's no long hardware accelerated) and, it seems, suffered an increase in latency - the DirectSound implementation uses a 40ms buffer (I think). That's all very well, but there's a bug in the implementation - IDirectSoundNotify fails to trigger if the positions are more closely spaced than this.

The preferred API for this kind of thing is now XAudio2 which is actually a little bit nicer (the code to do the same thing is slightly shorter) and works on both XP and Vista (unlike WASAPI). I can't really fault Microsoft too much since apparently this particular use of IDirectSoundNotify is rather unusual (or they would have made it work) but still it's annoying that DirectSound went from being the recommended API to buggy and (practically if not technically) deprecated in a single Windows version. Still, I understand that the world of Linux audio is even worse (albeit getting slowly better).

I wonder why audio APIs seem to go through so much churn relative to graphics (given that audio is that much less complicated, and that the hardware isn't really changing much any more). I sometimes wish all the audio APIs were as simple as a "get the next sample" callback API but I guess this is too CPU intensive, or at least was when the APIs were designed.

Website for making websites

Friday, July 9th, 2010

This is an idea I came up with when I was working on Argulator. Making websites like this involves a lot of very fiddly work - keeping the server-side and client-side bits synchronized, implementing login/logout stuff and so on. It would be great if there was a site (a "MetaSite" if you will) that let you create data-driven web applications whilst writing a minimal amount of code - sort of like how WordPress lets you make a blog whilst writing a minimal amount of HTML and CSS.

MetaSite would have, built in, the ability to create user accounts, login, logout, change password and so on. The users thus created can then create their own sites and interact with sites built by others. Some sites might require email verification and/or a captcha solve before they will allow the user to do certain things - MetaSite would take responsibility for that and once these tasks are done they don't need to be redone for each application.

There would be an "admin hierarchy" in that an admin of a site can appoint other users as admins with special powers (moderation and so on) who can then further delegate their powers. Upon withdrawing power from an admin, that power is then withdrawn from the closure of admins they've delegated to.

Users would be given tools to allow them to specify which sites can access the information stored by which other sites. One such "built in" site might be a repository of personal information.

Another useful "built in" site would be messaging - a virtual inbox which sites can use to notify their users when events of interest happen.

Yet another useful "built in" would be a shopping cart application which lets applications act as online shops. So if you've written a site and you decide to add a feature which you want to charge users for, it's as simple as just ticking a box. Since payment is centralized with MetaSite, it would be possible to do micropayments (making a single credit card charge to buy "tokens" which can be spent on any MetaSite site).

So far, nothing too unusual - most of this has already been done. Where MetaSite really comes into its own is its rapid application development tools. If you want to create a blogging service, a photo hosting service, an Argulator clone, a wiki or whatever else, it's just a matter of defining what the "objects" that are stored in your system are and how users can interact with them. MetaSite would have various widgets built in so if you define one field to be a date and say that users can edit this date, all the code for a calendar control widget would be automatically instantiated. All the defaults would be secure, reasonably attractive and AJAXified so that the site is nice to use. When developers do need to resort to code it would be written in a sandboxed language (not just uploading raw PHP to the site, that would be a security nightmare). This language would have instrinsics which abstract out all the web-specific stuff and allow developers to just concentrate on their application domain.

This is the big difference between MetaSite and Facebook - if you want to create a Facebook application you need to have your own web server and you need to write the server-side code to run on it. MetaSite would have a very shallow learning curve - making a new site should be as easy as starting a blog.

Metasite applications would be limited in the amount of storage space and CPU time they can use, so that they don't adversely affect other sites. One way Metasite could make money would be to sell extra CPU time and storage space to sites that need it. MetaSite could also make it easy for site admins to add advertising to their sites to monetize them and/or pay for such extras. Another value added feature might be the ability to run a MetaSite site with a custom domain name, or even on your own server.

Everything on MetaSite would be customizable - objects, data editing widgets, even entire applications. In fact, the usual way of creating one of these would be to take an existing thing that's similar to the thing that you're trying to make, and modify it. These modifications could then be available to authors of other sites as starting points for their own things (in fact, I think this should be the default and the ability to keep your customizations proprietary should be a "value added" feature).

Ideally it would be the first stop for any web startup trying to get their site up and running as quickly as possible, but if nothing else it would be a way to take advantage of all those "I need a facebook clone and will pay up to $100" projects on vWorker (nee RentaCoder).

I've gone so far as to register MetaSite.org but I haven't done anything with it yet.

VC++ inline assembly hack

Thursday, July 8th, 2010

Here's a cute little hack I came up with when doing some fixed-point arithmetic stuff in Visual C++ a while ago. I wanted to make a templated C++ class with some inline assembly, the generation of which depends on a numeric template parameter. Doing the obvious thing doesn't work:

template<int N> class ArithmeticHelper<N>
{
public:
    static Int5 MultiplyShiftRight(Int5 x, Int5 y)
    {
        __asm {
            mov eax, x
            imul y
            shrd eax, edx, N
        }
    }
};

However, one can introduce numeric parameters into inline assembly code by pretending that they are the length of something, and using the "length" operator:

template<int N> class ArithmeticHelper<N>
{
private:
    static const int nn[N];  // VC's inline asm doesn't allow "shl eax, N" directly, but this works...
public:
    static Int5 MultiplyShiftRight(Int5 x, Int5 y)
    {
        __asm {
            mov eax, x
            imul y
            shrd eax, edx, length nn
        }
    }
};