Archive for the ‘computer’ Category

Why GOTOs are bad

Friday, September 23rd, 2011

It seems to me that a lot of the complaints about GOTOs turning programs into spaghetti are really about jumping into a block - jumping out of blocks is much easier to reason about, and in fact with the minor modification to the break keyword I mentioned before it's trivial to turn any such GOTO into structured code without even rearranging anything.

I do have some C++ code (written for my Mandelbrot plotter program) which uses "goto". I originally worked out the algorithm as a state machine, so writing it in that way was natural. I tried rewriting it to use loops and conditionals instead but it just obscured what the program was really trying to do. I could have used a switch statement inside a loop as well but I think that's really just a more verbose way of saying the same thing.

When is your program compiled?

Thursday, September 22nd, 2011

When we're all using languages which can compile code at runtime it's going to be more important to think about when your code is being compiled. Take a fractal plotter for example - one in which users can supply a formula to be iterated. Obviously we'd like to do some compilation here instead of just interpreting the formula, as the latter would be very slow. But how often should we recompile it? The more often we recompile, the more information we have available to us and the more optimizations we will be able to use. For example, if we just compile when the formula changes, we would have to use arithmetic operators which can work with any precision, but if we recompile whenever the precision changes we can unroll those arithmetic operators and make the code much faster (especially for low precisions). On the other hand, recompiling does have some overhead so we probably wouldn't want to recompile for each pixel. Though for some formulae that might actually be helpful - if we can hoist a test from per-iteration loop to the per-pixel loop and the iteration count is high it might be worth it.

One possibility might be to give the code-generation library the freedom to compile whenever it likes, so it can try various things and run with what works best.

A weird thing in Haskell

Wednesday, September 21st, 2011

Here's something odd I noticed while playing around with the Haskell programming language. Sometimes, a==b does not imply f(a)==f(b). Look:

> 1/0
Infinity
> 1/(-0)
-Infinity
> 0==(-0)
True
> (1/0)==(1/(-0))
False

Language in which assignment is reflective

Tuesday, September 20th, 2011

Sometimes I like to think about the most fundamental assumptions I can about how things work and wonder "what if this were not true"? Here is an example of that sort of crazy thinking.

In most programming languages the statement "a=b" assigns the value of "b" to the variable named "a" - the two things "a" and "b" play completely different roles (although after the statement has executed they both have the same value). What if assignment instead were reflective, so that "a=b" meant the same thing as "b=a"?

In that case, the information about whether this statement says something about "a" or something about "b" must come from elsewhere. For example, if earlier we had said "a=1" then both "a=b" and "b=a" would mean "b=1". So each variable would have some information associated with it (let's call it the "direction") that says whether it accepts values from other variables or passes its value to other variables when assigned.

If the direction of variables can change over the course of a program's run, it won't generally be possible to determine statically that two output variables are never assigned to each other, so that would have to become a run-time error. Unless we have the ability to change the direction of two connected variables at the same time, we'd have to allow two input variables to be assigned to each other (and give them both some default value).

Perhaps surprisingly, you can end up with a Turing-complete language by going down this route - all you need are two other elements: one is a way of connecting three variables together (obviously no two of them could be outputs at the same time) and a NAND or NOR operator with two inputs and an output. Then you can build any function like a logic circuit.

What's the use of this? Well, it's handy for describing electronic circuits (where assignment means connection). Also, when I write my modular emulator it will have a language like this built in for describing how things are connected up. For that the "two outputs connected together" isn't just a theoretical problem any more - the Commodore PET computer had a Killer poke which connected two outputs together, damaging circuitry. An emulator could have some fun with how it handled this - I'm thinking an "explosion" graphical effect might be a nice easter egg.

Patches

Sunday, September 18th, 2011

One of the reasons the state of desktop computer security is in such a mess is that the average desktop machine has a lot of different pieces of software with attackable surface area but they all have different update mechanisms, some more reliable than others, some more painful than others. On Windows machines, the Microsoft patches are pretty good - they come out the second Tuesday of every month (occasionally sooner if a flaw is being actively exploited). Everything updates at once through a single mechanism. Usually there's a reboot required, but rebooting once a month isn't too bad. The most annoying gripe is the massive amount of CPU time spent to re-NGEN everything when there's a .NET update.

Then there's the Adobe patches which seem to get discovered whenever the machine is rebooted, and then themselves require a subsequent reboot (grr). Is that just Reader or does Flash update through the same mechanism? I'm not even sure. Firefox updates itself almost every time I start it if I haven't used it for a while, requiring me to have to start it again. I know the Google applications update themselves since I see googleupdater.exe in my task manager but they're very quiet about it - I don't even know when they update.

It would be far better if everything was updated through a centralized mechanism like my Ubuntu box does. Unfortunately, that update mechanism is very intrusive - it seems like every other day that it finds some new updates and installs them (though it rarely requires a reboot, which is nice). Also unfortunately, I suspect that it only works with software installed through the built-in software installation mechanism - programs I installed manually like the flash plugin and XMMS won't be updated.

To ensure everybody's desktop machines are secure, here is how I think things should work:

  • There should be a centralized mechanism for notifying client machines of new versions of software. Actually there should be several - if I don't trust X's server to be up-to-date, I should be able to use Y's instead.
  • The server I'm using should keep track of which pieces of software I have on my machines and their version numbers. The update checking process should be trivial in the "no updates available" case: my client machine just asks the server "any new updates for me" and the server goes "nope". Client machines should also know what they have installed so that if I change servers, the client just uploads the list of software to the server.
  • Some servers might only track a subset of software - that's okay, I should be able to specify multiple servers.
  • Software updates come in two sorts - new versions and security bug fixes. New versions I want to be notified about (but don't necessarily want to install immediately). Security bug fixes I want installed as soon as possible. The defaults should reflect this (though it should be possible to opt out of automatic installation of security bug fixes - some administrators want to be able to check out new fixes before installing them to make sure they won't break anything.
  • When a security bug fix is downloaded, it should be installed automatically, the application (or even system) restarted if necessary, and everything put back in its previous state - i.e. I should never lose anything due to a restart. This is the really tricky bit.

Application updates are the easiest types of restart - developers generally know if the application they are writing has security surface area, so (when it does) they can tell it to listen for messages from the updater application and, when an update is available, persist all its state, close down and restore all that state on the next startup.

The next most difficult thing is an update in a shared component. Some of the applications which use this component might be update-aware, and can be restarted just as if they were being updated themselves. Others might not be - consider the case of a compiler, which is designed to do what it is asked and then to exit - it's not supposed to be running for an extended period of time. These can generally be left alone to pick up the update on their next restart, or trusted not to care if a component they are using is insecure. This means that multiple versions of a shared component might be in use at any given time - the system should be able to cope with this. Of course, if these shared components are separate processes this ceases to be a problem.

Inevitably (even with the ability to patch running kernels) there will be times when a system reboot is necessary. Perhaps a driver puts a piece of hardware into an insecure state and it physically can't be put back into the secure state without a hardware reset. So the OS needs to be able to freeze a process, serialize it to disk and then restore it to the exact same state after a reboot. This means that all applications (even those with no security surface area) need to be written to be able to cope with the clock suddenly jumping forward, or hardware devices they are using suddenly being reset. In modern pre-emptive multitasking systems this is usually the case anyway.

On some operating systems, a reboot implies a modification of the address space layout of processes - in particular, Vista's Address Space Layout Randomization (ASLR) rebases system DLLs on system startup so that they are not in predictable places. I think it doesn't particularly affect security if a "forced update" causes the system DLLs to keep the same address across that particular reboot - as long as the address space layout isn't predictable across a large fraction of the machines running that software, the defense mechanism retains its effectiveness.

Windows 7 annoyances

Friday, September 16th, 2011

In the same vein as my previous post on Vista, here are my thoughts on Windows 7, which I upgraded my laptop to a while back.

I kept the aero theme on for less than 24 hours this time - the trouble with it is that there isn't enough visual difference between the active and inactive windows. I'm used to a less subtle visual hint about where my typed characters will end up (I could probably get used to it if I had to). Also, the aero window borders are much bigger, which makes them look extra ugly to me. Yes, I know you wanted lots of pixels to show off your pretty glass effect, blah blah. Unfortunately Microsoft gratuitously changed the theme format and didn't see fit to include any way to import XP .theme files, so lots of tedious fiddling with the customization dialog ensued.

I don't like the new taskbar - it quicky annoyed me and I made it as XP-like as possible. Unfortunately Windows doesn't expose a setting to turn off the grouping of similar applications, so until I discovered 7 Taskbar Tweaker, the button I was looking for was never where I expected it to be. The selector thingy which pops up when you hover over a taskbar button annoyed me more than it helped me.

Windows Mail is gone, replaced by Windows Live Mail which is horrible. I found a hack to get Vista's Windows Mail running on Windows 7 which is fine, although: don't use "Take Ownership" to modify the files or you'll break WinSxS and destroy your ability to install service packs (seriously - I had to reinstall the OS to fix this). Instead, rename the "C:\Program Files\Windows Mail" and "C:\Program Files (x86)\Windows Mail" directories and then create copies of them with the original names. You should then be able to replace files in the new copies with no problems (I haven't tried installing a service pack since doing that, but at least it's trivial to undo - just two renames).

Unfortunately, when Windows Live Mail imports from a Windows Mail store, it breaks the Windows Mail store! I had to restore it from a backup, which was extremely annoying.

The new black and white power/wifi/volume icons are really ugly with a classic theme, and are spread too far apart. Unfortunately it appears that users can't change this. I don't like the new way of revealing hidden notification area icons.

The icons in Windows Explorer are much more spread out than those in Vista, so you get to see fewer files in the same space. Fixing this is unsupported and rather fiddly. (Here is my updated ExplorerFrame.dll, although I didn't update it after installing SP1 so it might not work anymore.)

I haven't really found myself using any of the new window management gestures, except accidentally (good job I knew they were there, or I would have been very confused). These might be one of those things that I pick up eventually and then can't live without, but at the moment they are just a vague annoyance. I turned the docking off as doing it accidentally was getting annoying.

UAC is less noisy than Vista's but I still turned it off when a standard command prompt wouldn't let me cd into the "\Users\Andrew" directory from my old Vista installation. I guess Microsoft doesn't expect people to use the keyboard to organize files in this day and age.

You still can't type a path into the dialog that you use to select a directory, you have to painstakingly navigate with the mouse through the hierarchy.

At least the terrible freezing bug doing remote desktop from an XP machine seems to be fixed.

It has crashed a few times which didn't give me a good feeling about it, but I think that was due to overheating (I forgot to reinstall the fan control program that stops this from happening).

It's very fast, which is probably due more to the fact that I upgraded to an SSD at the same time (I didn't want to run an SSD on an OS without TRIM support).

Thoughts on ActionScript

Thursday, September 15th, 2011

Declaring variables "var i:int" instead of "int i" is a nice experiment, but I don't think it works. When I need to turn an assignment into a declaration or vice-versa, I now need to do it in two places (before and after the variable name) instead of just one.

That this gives an error about redeclaring i is just plain dumb:

  for (var i:int = 0; i < 10; ++i) { foo(i); }
  for (var i:int = 0; i < 10; ++i) { bar(i); }

It means I have to declare all my variables at the top, old-fashioned-C-style, or spend ages fixing up the declarations whenever I move code around.

I hate it when compilers enforce a "one class per file, with the same name" rule. I'd rather have my source files organized for the convenience of humans, not for the convenience of machines, thank you very much - if I have a bunch of small related classes, it's much better to have them in the same file.

No stack objects means that if you want to want to write a function that returns multiple values (e.g. a function that multiplies a matrix by a vector and returns a vector) you have to allocate an object to put the vector components in, which is very slow. For reproject I ended up just doing everything with plain values in a small number of functions - in C++ the code could be much more well-layered.

More riffing on software licenses

Wednesday, September 14th, 2011

Suppose you've written some software from scratch on your own time, using no resources from your employer. Suppose also that, having written it, you're proud of it and want to show it off to other people, perhaps get them to use it and give you feedback on it.

When doing this, the question invariably comes up of which software license you should publish it under, especially if other people would also like to contribute to it. If your aim is to eventually publish it as proprietary software you might choose a restrictive license like the Microsoft Reference Source License (Ms-RSL) which allows people to look at the source code but not to redistribute it. If your aim is to further the cause of eliminating all proprietary software you might choose a copyleft license like the GNU General Public License (GPL) which forbids use of the code in proprietary software. If your aim is to allow the software to end up running in as many places as possible, you might choose a permissive license like BSD.

Generally one will have multiple goals. For example:

  1. I want the software to be as useful as possible to its users.
  2. I want the software to do the greatest good for the greatest number of people. Note that this isn't the same as (1), since software can be useful to those running it without doing good to society as a whole - consider the case of the software controlling the machines at the cigarette factory.
  3. I want the software to be as widespread as possible to maximize my reputation.
  4. I want to make as much money as possible from the software, one way or another.

I think it's pretty difficult to predict at the start of a program's lifecycle which license is going to have which effect on which goals. Therefore the choice tends to be made on philosophical principles rather than sound science, such as "I believe the greatest good is done by making life as difficult as possible for developers of proprietary software" or "I believe that putting as few restrictions as possible on developers will make the software as useful as possible" or "I can maximize my chances of making money with this software by forbidding its distribution".

The philosophical principle which currently seems to make most sense to me is that goals 1, 3 and 4 are the important ones: the best way in the long term to make lots of money out of my personal programming projects is to make a good reputation for myself, which means getting my software running in as many places as possible, which means making it as useful as possible. Goal 2 would be nice but it's impossible to predict the long-term social good of any given action. The money spent on Microsoft software has done an enormous amount of good, for example, but that doesn't necessarily mean that all Free Software authors should drop what they are doing and contribute to proprietary software instead.

Suppose I write some piece of software X, and software developer Y takes this work and incorporates it into product Z, adding proprietary feature W and releasing the result as proprietary software. I haven't really lost anything in this situation - I (and the other users of X) can still do all the same things we could do before Y got involved (we can just pretend that product Z doesn't exist if we prefer to). Users who buy Z benefit because feature W is worth more to them than the cost they paid (both in money and reduced freedom) - they have one option that they didn't have before Y got involved. I (and the other users of X) benefit from publicity that Y creates around Z (especially if they don't try to hide the fact that Z is based on X) since that may itself draw more users to X (users who would not have otherwise heard about X and for whom W is not worth the cost).

As I see it there are a few disadvantages for copyleft licenses. One is that they are incompatible with existing non-copyleft software licenses (for example, one cannot take GNU code and use it in software that has an incompatible but still copyleft license). Another is that many companies have an aversion to copyleft licenses, eschewing such software entirely for the (not unfounded) fear that an accidental inclusion of copyleft software into their proprietary products will cause said products to become copyleft. In such cases the advantages for X above would never have happened.

Compatibility with other licenses is a big advantage for a permissive license - the BSD license is compatible with the GPL, for example (meaning you can take BSD code and release it under the GPL - the reverse is forbidden). Can we do better than the BSD license for compatibility? I think we can - even the BSD license includes some restrictions, such as the one that the warranty disclaimer must remain in place. I don't really see the advantage of warranty disclaimers - they seem like a CYA piece of legalese which doesn't actually do anything except take up space in licenses. I'm not a lawyer, but has anybody ever been sued because free software they wrote misbehaved and didn't have a warranty disclaimer? I doubt it - it seems like we can't even prosecute malware authors, who not only don't disclaim warranty but write software that's actively harmful.

My preferred software license would therefore be the most permissive one possible - essentially equivalent to not being copyrighted at all (same as "public domain" in many places). This state of affairs is compatible with any license that a third party chooses to use when redistributing the software. Maximum compatibility means greatest usefulness. Now, not all contributors to my software necessarily feel the same way, and that's okay - if Richard Stallman wants to contribute to my software but only on condition that his changes are GPL, that's okay - he can just make a GPL fork of my software and contribute to it. Others might contribute to my fork (if they share my views) or to the GPL fork (if they share Stallman's). One could even have a change-tracking system which would keep track of the license associated with each change, and therefore be able to tell you which license you can distribute the software under, if you include changes A, B and C. Forks could then compete on the merits of the resulting software - i.e. if one license tends to get more and better patches than another, its quality will rise and it will tend to be the one that gets used, so further patches will build on it and use the same license. If one particular license causes the software to be less useful, that license should die out. It would be an interesting ecosystem experiment to do. My suspicion is that the root "public domain"-like license would tend to be win more often than not because I would expect that most programmers would think the same way I do, preferring to get their patch into all license forks rather than just one. Also, restrictions can always be added in the future but removing them is much more difficult.

Downsides of copyleft

Tuesday, September 13th, 2011

As well as being not invulnerable and somewhat unnecessary, copyleft licenses may also be at least a little bit harmful to the software they are protected by, at least compared to permissive licenses.

The trouble with Free Software licenses (especially restrictive ones) is that they have a tendency to be incompatible with other licenses (even other Free Software licenses). Consider the sad case of the GCC manuals which underwent discussion on the GCC developers mailing list a while back. GCC is licensed under the GPLv3, whilst the documentation is licensed under the GFDL. Both of these are Free Software licenses, but because they contain slightly different restrictions, they are mutually incompatible - you can't take GCC source code and copy it into the GCC manual, nor can you take parts of the GCC manual and copy them into the source code. This means that the technique of literate programming is impossible - documentation can't be automatically generated from the code.

This could be solved if there was one single copyleft license that was used for everything, but I can't see that happening if the GNU project can't even standardize on a single license for its own projects.

It's important to note that these problems are caused by the restrictions imposed by the license, not the freedoms it guarantees - the same problems would occur with proprietary software, if the corresponding situation ever occurred.

Benefits of copyleft without copyleft

Monday, September 12th, 2011

Yesterday I showed that copyleft is not invulnerable. Today I will show that it is also (at least somewhat) unnecessary.

Let's again suppose that Microsoft wants to make a proprietary MS-GCC. They make version 1 (based on GCC 4.6.1, say) and it all works out great. But then GCC 4.6.2 becomes available and the proprietary patches don't work with the latest version any more. Microsoft wants to take advantages of the changes in 4.6.2 (otherwise MS-GCC will look bad compared to the latest FSF-GCC). So they re-do their patches. Each time a new GCC becomes available, Microsoft has some work to do to keep up.

It costs Microsoft money to keep doing this, so they look for an alternative. One such alternative is to assign copyright of these patches to the Free Software Foundation and contribute them to FSF-GCC. This might be a bit more work than just updating the patches, but it only needs to be done once. Then, further versions of GCC will include the Microsoft patches without any additional work on Microsoft's part.

This does, of course, mean that those proprietary changes become Free Software. That might not be such a big deal - chances are that if MS-GCC was really that much more useful than GCC, the equivalent functionality would be duplicated by GCC developers anyway (especially now that they know it's possible and useful).

So the "carrot" part exists independently of the "stick" part (the GPL) - even without any threat of a copyright violation lawsuit, there are still reasons to contribute your changes.