Archive for the ‘computer’ Category

Linux cleartype subtly broken

Wednesday, September 17th, 2008

Most modern graphical desktops have an option to render fonts taking into account the positions of the sub-pixels on liquid crystal displays. Since these are always in the same positions relative to the pixels, one can subtly alter the horizontal position of something by changing its hue. This effectively triples the horizontal resolution, which can make for a great improvement in the readability of text.

Unfortunately, my Ubuntu desktop doesn't do this correctly - some bits of text have a yellowish fringe and some bits of text have a bluish fringe, both of which are quite distracting. The problem is that while you can alter the horizontal position of something at sub-pixel intervals, you can only alter its width by whole pixels (otherwise the hue changes don't cancel out and integrating over a region of the screen gives a yellow or cyan colour.

I've therefore had to switch my desktop to use grayscale anti-aliasing, which is a bit more blurry. Fortunately the pixels on my monitor are small enough that this doesn't bother me very much. I do prefer the font rendering that Windows does, though. While FreeType does apparently include code to support Microsoft's patented programmed hinting I can't seem to get the font rendering on Linux to look as good as it does on Windows.

Equations for 3D graphics

Tuesday, September 16th, 2008

When I first learnt how to do 3D graphics it was as a recipe. You take your 3D world coordinates x, y and z and rotate them (rotation matrices around the coordinate axes are particularly easy to write down). If you want perspective, you then divide x and y by z (for isometric views just ignore z). Next you scale the result by the number of pixels per world-coordinate unit at z=1 and translate so that x=y=0 is in the center of the screen.

This worked great, but didn't tell me why this was the right thing to do.

Later, reading about ray tracing I realized what was really going on. Your model is in 3D space but also in 3D space are some additional points - the entrance pupil of your eye and the pixels that make up your screen. If you imagine a straight line from your eye passing through a particular point of your model and going on to infinity, that line may also pass through the screen. If it does, the point on the screen that it passes through corresponds to the pixel on the physical screen at which that point should be drawn.

Since computer screens are generally rectangular, if the positions of the corners of the screen are a, b, c and d (a and d being diagonally opposite to each other) the position of the fourth corner can be determined from the first three by using the equation a+b=c+d. So to fix the position, orientation and size of the screen we only need to consider 3 corners (a, b and c). We also need to consider the position of the eye, e, which is independent of a, b and c. We thus have 12 degrees of freedom (3 coordinates each for a, b, c and e). Three of these degrees of freedom correspond to translations of the whole screen/eye system in 3D space. Two of them correspond to orientation (looking up/down and left/right). Two correspond to the horizontal and vertical size of the screen. Three more give the position of the eye relative to the screen. One more gives the "twist" of the screen (rotation along the axis between the eye and the closest point on the screen plane). That's eleven degrees of freedom - what's the other one? It took me a while to find it but eventually I realized that I was under-contraining a, b and c - the remaining degree of freedom is the angle between the top and left edges of the screen (which for every monitor I've ever seen will be 90 degrees - nice to know that this system can support different values for this though).

If you solve the equations, this system turns out to be exactly the same transformations as the original recipe, only a little more general and somewhat better justified. It also unifies the perspective and isometric views - isometric is what you get if the distance between the eye and the screen is infinity. Obviously if you were really infinitely far away from your computer screen you wouldn't be able to see anything on it, which is why the isometric view doesn't look as realistic as the perspective view.

Many 3D graphics engines allow you to set a parameter called "perspective" or "field of view" which effectively increases or decreases how "distorted" the perspective looks and how much peripheral vision you have. This is essentially the same as the eye-screen distance in my model. To get the most realistic image the FoV should be set according to the distance between your eyes and your screen.

Cheating in online games

Friday, September 12th, 2008

Many modern online games go to great lengths to try to prevent players from cheating - some of them quite invasive. I hope that in the future our computer systems are architected to make it trivial to subvert such malware, but where does that leave players of on-line games who want to avoid playing against cheaters?

Well, I hope that in the future, online games are designed in such a way that cheating is impossible - that is, no information sent to a player's computer that would allow them to gain an unfair advantage and no information received from a player's computer is trusted to be untampered-with. In such a game, a player would be allowed to use whatever client software they like, and would be encouraged to use client software that gives them the best odds. Serious golf players pick the clubs that give them the best advantage, tennis players pick the best tennis rackets and so on - why should on-line gaming be any different?

One implication of this is that the best games will be Turing tests of a sort - a game of aiming accurately and reacting quickly won't be much of a challenge since the computer can do those things better than human players. Tasks that humans can do well but computers are poor at will need to be integral to the game.

That's not to say that games of aiming accurately and reacting quickly won't exist, but if they are played online they will be played between people who each other and trust each other not to cheat, not between strangers.

Static scoping improved

Thursday, September 11th, 2008

Many programming languages have a facility (usually called "static") to allow the programmer to declare a variable which is visible only to some particular object but has storage at the program's scope - i.e. its value is the same for all instances of that object and when it changes for one it changes for all the other instances too.

One programming language feature I've never seen (but which I think would be useful) is a generalization of this - the ability to declare a variable which is only visible in a particular object but whose scope is the (lexical) parent object. I call this "outer". For top-level objects, this would be the same as static but for nested classes the scope would be that of the outer class.

One could even use the "outer" keyword multiple times to put the variable in any particular level in the object nesting tree. This doesn't violate encapsulation, since members can still only be declared inside their classes.

If you have "outer" instead of "static" (and maybe a few other more minor tweaks) any program can be turned into an isolated object inside another program - i.e. you can easily turn a program into a multi-instance version of that program with all the instances running in the same process.

Desktop security silver bullet

Wednesday, September 10th, 2008

Suppose there was a way to write a desktop operating system in such a way that no malware (defined as any software running on a user's system that puts the interests of its authors before the interests of the user) could have any ill-effects? Would it be implemented? Probably in open-source, but I don't think Apple or Microsoft would include such technologies. Why? Because they wish to put their interests ahead of their users, running code on customers machines which implements such malware as copy-prevention (DRM), anti-cheating mechanisms in games and debugger detectors. Such a security system would make it very easy to work around such bad behaviour (it's always possible to work around it, but currently not always easy).

If such a security mechanism is possible, I think the way to do it would be through API subversion. When process A starts process B, it can ask the OS to redirect any system calls that process B makes to process A, which can then do what it likes with them - anything from passing them on to the OS unchanged to pretending to be the OS to process B. Since malware (on a well-designed OS) will need to use system calls to find out anything about its environment, it is impossible for process B to tell whether it has been subverted or not. Any filing system operation can be sandboxed to make process B think it has changed critical system files. Even the clock APIs can be subverted to make process B think it is running as fast as it should be.

Once this infrastructure is in place, you can sandbox (say) your web browser to be able to make no changes outside its cache, cookies and downloads directories. Any untrusted processes can do essentially nothing to the system without being given permission by the user, but they can't even tell whether that permission has been given or not, so it makes no sense for them to even ask for it. This essentially solves the "dancing bunnies" problem - any requests by malware to have extended capabilities can (and most likely will) be ignored, since the user would have to go through extra steps to do so, and these steps in no way make dancing bunnies any more likely to appear.

One problem with this scheme is that the time it takes to do a system call is multiplied by the number of subversion layers it goes through. One can't use in-process calls because then the malware would be able to examine and modify the system calls by reading and writing its own process memory. So one would need to use the techniques described here.

Variable APIs should be string-based

Tuesday, September 9th, 2008

If you've got a big, complex application programming interface there are generally two ways to do it. You can assign an integer to each function (or method if it's an object-oriented interface) and share the mapping between integers and method names between the callee and caller, or you can provide a single function or method which takes a string.

The first way is generally used in older and more speed-sensitive code, since it has generally been a bit faster in the past (though for most purposes probably isn't significantly so now, especially for out-of-process calls - the context switch time will dwarf the string-lookup time).

The second way is used on the internet (where everything is text based and the network dominates most timings). It's also used in Windows DLLs (though this is more of a hybrid - the names are resolved to locations at DLL load time). COM, on the other hand, uses the first method (and has a lot of complicated registry/interface/type-library goo to make sure the numbers don't get out of sync).

I think that any time you have an API which is sufficiently complicated that it may change with future software or hardware releases, the string-based method is the way to go. Especially for something like this. Keeping track of all the possible variations of the mapping between method names and integers is always going to get sufficiently complicated that you might as well have the computer do it.

Coping with slow API calls

Wednesday, August 27th, 2008

I was thinking some more about this recently. Specifically, APIs are generally implemented as DLLs or shared libraries which you load into your process and which expose functions that can simply be called. But in a world with no DLLs, how do you do API calls?

At a fundamental level, there are two things you can do - write to some memory shared by another process and make system calls. The former is useful for setting up calls but not for actually making them, as the process implementing the API has no way to notice when the memory changes (unless it polls, which is not a good solution). But system calls are expensive - the context switch alone could be many thousands of CPU cycles.

I mentioned in the linked post that bandwidth for inter-process communication should not be a problem but I neglected to mention latency - if you can only make API calls a few tens of thousands of times per second, certain operations could become very slow.

However, the latency of any single call is still far below what could be noticed by a human being - the only perceived speed problems caused by this architecture would be if many thousands of API calls were made as a result of a single user action.

I think the answer is to design APIs that aren't "chatty". Such designs are already in common use in database situations, where high-latency APIs have been the rule for a long time. Instead of having getCount() and getItemAtIndex() calls to retrieve a potentially large array of data, you retrieve a "cursor" which can return many records at once.

Another possibility is for APIs themselves to be able to execute simple programs. Again this is an idea used by databases (SQL, the syntax usually used to access databases, is itself a language in its own right). Such programs should not be native code (since that gives all the same problems as DLLs, just backwards) but can be interpreted or written in some sort of verifiable bytecode (which could even be JIT compiled for some crazy scenarios). The language they are written in need not even be Turing-complete - this is just an optimization, not a programming environment.

If all else fails, there is still another way to get low-latency, high-bandwidth communication between chunks of code which don't know about each others' existence, and that is to create a whole new process. The involved APIs return a chunk of code (perhaps in the form described here) and the calling program compiles these (and some code of its own) into a (temporary) chunk of binary code which then gets an entirely new process to execute in. This moves potentially all of the IPC costs to a one-time startup phase. The resulting design resembles Java or .NET in some respects.

This is kind of like the opposite of the Singularity OS - reliability guarantees enforced by the compiler, but the OS allows you to use all the user-mode features of your CPU (not just the ones supported by your runtime) and processes are isolated by the MMU.

rm for the web

Monday, August 25th, 2008

My first thought on reading this article was "what would an rm for the web look like?". It seems that many of the commenters on that post had the same thought.

One possible way this could work is to set up a proxy that people can set in their browser. Then if they don't want to see a particular website any more, they can visit the "rm" website, enter the address of the website that they don't want to see any more and then it would be blocked by the proxy. The "rm" website should of course also provide some way to remove a site from the blocked list.

There are a few reasons why such a thing might be useful. There are a few "shock websites" out there which most people would not want to visit knowingly. It could significantly cut down on the number of instances of rickrolling and spyware infections. It could enable organizations to take control of their own web filtering needs rather than relying on some off-the-shelf solution which is bound to block some sites that it shouldn't. Of course it would be easy to circumvent, but that's not the point - the idea isn't to prevent people from seeing stuff they really want to see, it's to keep them honest and keep them from accidentally stumbling onto unsavory areas of the web.

The trouble with editors

Wednesday, August 20th, 2008

Since changing jobs I now mostly work with Unix machines rather than Windows ones. Mostly it's a not too unpleasant experience (once things are set up the way I like them) - Unix seems to be much more flexible and configurable than Windows, even if some things don't work quite so well out of the box.

One area I have run into difficulties with is editing text files. When I first started using DOS I used to use edlin, which is horrible. After a while I discovered QEdit on a magazine cover disk and loved it - very quick to load and scroll (even on an 8MHz 8086 machine), easy to use but with powerful macro facilities there when you need them.

I used that up until 2002 when I switched from using Windows 98 to Windows XP and QEdit stopped working so well. I bought TSE (the updated, renamed QEdit) and it was worth every penny - everything that was great about QEdit was the same or better in TSE and I've used it almost every day since for almost every text editing task (though I have been known to use the Visual Studio editor occasionally for its Intellisense goodness). Moving from QEdit to TSE I discovered that the Windows shortcuts for select, copy and paste (shift+navigation to select, Ctrl-X to cut, Ctrl-C to copy and Ctrl-V to paste) were much quicker and easier than the QEdit defaults (which I think originated in WordStar) Ctrl-K B, Ctrl-K K etc.

When I switched to Linux, TSE was one thing that I missed most. It works under Wine but not particularly well (it always defaults to full screen, copy/paste integration doesn't work and scrolling is painfully slow). A Linux version of TSE is in the works but doesn't show any sign of materializing soon.

Joe is the closest thing to TSE that Linux has to offer but it's a bit limited and just different enough to be annoying. It also isn't available on all of the machines that I need to use. The latter requirement basically means I have to make the big choice - vi or emacs?

I gravitated towards vi initially as it seemed to be a bit more lightweight and I had some experience with it from my brief stint using Debian sometime around 1999-2000. I relearnt how to do the basics, put it into syntax highlighting mode and fix the colours (dark blue on black is unreadable to me, and of course comments have to be in green). But it still drives me crazy - I never know which mode I'm so I'm always inserting ":q" into files or (worse) typing text which gets interpreted as random commands and then trying to figure out what I did and how to undo it.

I eventually found out that most people who develop GNU software use Emacs (I saw a lot of people using it at the GCC Summit) and that it has great features for writing C code according to the GNU guidelines. So I decided I had better learn Emacs. I had used it a bit at university (the Physics department gave us a brief tutorial in it as part of the Fortran course). I have also learnt a bit of Lisp in the intervening years and having Lisp as the macro language of an editor seemed to be a rather reasonable thing to do.

One big problem annoyance with all the Unix editors is that none of them support the Windows-style shortcuts I mentioned above (plus Ctrl-Z, Ctrl-Y, Ctrl-A and Ctrl-S for undo, redo, select-all and save) by default. I almost squealed with delight when I discovered the CUA mode in Emacs 22 which brings it significantly closer to being the perfect editor (at least once I figured out how to tweak various terminal settings to actually make it work - this helped. Unfortunately many of the machines at work still use Emacs 21, so there was some more fiddling required to install the CUA package.

Then I need to figure out if I can get the scrolling behavior to work like QEdit's, fix up some behaviors related to tabs and (once again) fix the syntax highlighting colours. "Eventually I'll have everything the way I like it" may be the mantra of a great many Linux users...

Part of the reason this is so difficult is that customizing editors is generally considered an advanced thing to do - you really have to learn the basics first. But if the first thing you want to do with a new editor is make it act like your old one, you first have to learn the default way of doing things, then do the customization, then unlearn the default way of doing things. My natural inclination is to take shortcuts - try to figure out the advanced stuff without really getting the hang of the basics first. This does seem to have a tendency to complicate things.

String storage

Sunday, August 17th, 2008

Most applications store strings as a consecutive sequence of characters. Sometimes when the string is copied the characters are copied too. Sometimes strings are reference counted or garbage collected so to minimize this copying, but copies are made when concatenating and performing other "string building" operations (otherwise the characters would no longer be consecutive in memory).

An alternative that might work better (especially for something like a compiler) would be to do the concatenation lazily. Actual character data comes from just a few places (the input files which are kept in memory in their entirety, static character data, and program argument data). There are two subtypes of string - one consists of a pointer to the first character and an integer recording the number of characters in the string. The other subtype consists of a vector of strings which are to be concatenated together. Integers (and maybe also formatting information) could be kept in other subtypes. The resulting tree-like data structure has a lot in common with the one I described in Lispy composable compiler.

I'm not sure if this actually saves much (if anything) in terms of memory space or speed over the usual methods (I suppose it depends on how long the average basic string chunk is), but it does have at least one potential advantage - Vectors (especially if they grow by doubling) will have many fewer possible lengths than strings, so memory fragmentation may be reduced. I think it's also kind of neat (especially if you have such data structures lying around anyway).