Purpose of U90 in XT second revision board

September 21st, 2012

This is an expanded version of a post I originally made over at the Vintage Computer Forums.

My plan to write a cycle exact PC/XT emulator is not going to work as well as I had hoped if it turns out that the various revisions of the PC and XT that IBM made over the years turn out not to even be cycle exact with respect to each other. So I took a look at the differences between the revisions to try to understand if there would be any such differences.

One particularly interesting revision occurred between the first and second revisions of the XT - a whole new chip (U90) was added. The designers of the first revision board had the forethought to leave a space where a standard 14-pin 74LSxx TTL logic IC could be placed, so that revisions requiring extra gates could be made without completely redesigning the board. In the second revision this spare space is populated, but I wasn't able to find any information online about what that change actually did, so I compared the schematics between the first-revision technical reference manual I had, and the updated schematic in the March 1986 technical reference. The difference seems to be this:

In the first revision, input D3 to U88 is -S0 AND -S1 AND -LOCK AND HRQDMA. In the second revision, this is changed to (-S0 AND -S1 AND -LOCK AND HRQDMA) OR (NOT (-DMACS' OR -XIOW')) (U90 is 2-input OR gates, probably a 74LS32). In other words, this input is forced high when writing to the DMA controller registers (IO ports 0x00-0x1f: the 8237). The output Q3 from U88 goes through a couple more flip-flops to HOLDA (the hold acknowledge input to the 8237).

So I guessed that this is some logic to fix a race condition between the CPU and the DMA controller. I suspected that there was a rare situation where the CPU writes something to the DMA controller at the same time that a DMA access happens, the CPU and the DMA controller end up waiting for each other and the system locks up completely. Sergey (of Sergey's XT fame) confirmed it - the logic finds passive CPU cycles when DMA transfer is requested, puts the CPU in wait mode and activates the DMA controller using the HOLDA signal.

I guess that this lockup doesn't happen every time the DMAC is accessed during a DMA transfer or the machine would hang in something like 1 in 18 floppy drive accesses due to the DRAM refresh DMAs. So presumably some "harmless" DMAs will also get delayed by a bus cycle when they happen at the same time as a DMAC transfer. That suggests that it should indeed be possible for software to determine (without checking the BIOS) whether it is running on a first or second revision XT board. And, in turn, if I rely on a piece of software running in an exact number of cycles on one machine it may run in a different number of cycles on the other, at least if the DMA controller is accessed.

For the demo application I'm thinking of, this probably doesn't matter too much - the only time I'm planning to access the DMA controller is to initiate floppy disk transfers to stream in the next part of the demo, and those DMAs will occur at unpredictable times anyway (since the floppy controller has its own crystal). However, it might be interesting to try to write a piece of software that uses small timing differences like these to get all sorts of information about exactly what computer it's running on. As well as this U90 change, it might be possible to tell PCs from XTs by checking the cassette hardware, and it may also be possible to distinguish between many of the different variants of the 8088.

Fractional exponent floating point numbers

September 20th, 2012

The usual binary representation of floating point numbers is a sign/exponent/mantissa representation. To get an idea of what this means, consider a hypothetical 6-bit floating point format with 1 sign bit, 3 bits of exponent and 2 bits of mantissa. That gives us 64 possibilities, which is sufficiently few that it's practical to list all the possible values:

0 000 00 +0   1 000 00 -0
0 000 01 +0.0625   1 000 01 -0.0625
0 000 10 +0.125   1 000 10 -0.125
0 000 11 +0.1875   1 000 11 -0.1875
0 001 00 +0.25   1 001 00 -0.25
0 001 01 +0.3125   1 001 01 -0.3125
0 001 10 +0.375   1 001 10 -0.375
0 001 11 +0.4375   1 001 11 -0.4375
0 010 00 +0.5   1 010 00 -0.5
0 010 01 +0.625   1 010 01 -0.625
0 010 10 +0.75   1 010 10 -0.75
0 010 11 +0.875   1 010 11 -0.875
0 011 00 +1   1 011 00 -1
0 011 01 +1.25   1 011 01 -1.25
0 011 10 +1.5   1 011 10 -1.5
0 011 11 +1.75   1 011 11 -1.75
0 100 00 +2   1 100 00 -2
0 100 01 +2.5   1 100 01 -2.5
0 100 10 +3   1 100 10 -3
0 100 11 +3.5   1 100 11 -3.5
0 101 00 +4   1 101 00 -4
0 101 01 +5   1 101 01 -5
0 101 10 +6   1 101 10 -6
0 101 11 +7   1 101 11 -7
0 110 00 +8   1 110 00 -8
0 110 01 +10   1 110 01 -10
0 110 10 +12   1 110 10 -12
0 110 11 +14   1 110 11 -14
0 111 00 +Infinity   1 111 00 -Infinity
0 111 01 SNaN   1 111 01 SNaN
0 111 10 QNaN   1 111 10 QNaN
0 111 11 QNaN   1 111 11 QNaN

So we can represent numbers up to (but not including) 16, in steps of (at finest) 1/16.

The distribution of these numbers (on the positive side) looks like this:

As you can see it approximates an exponential curve but is piecewise linear. Despite there being 64 distinct bit patterns, only 55 real numbers are represented because of the infinities, Not-a-Numbers and the two zeroes. Near zero, the pattern is broken and the numbers between 0 and 0.25 have the same spacing as the numbers between 0.25 and 0.5 - these are called the "denormals" and cause enormous headaches for CPU manufacturers, OS developers, toolchain developers and developers of numerical software, since they often either don't work according to the standards or are much slower than the rest of the floating point numbers.

With IEEE-754 single precision floating point numbers, there are so many pieces to the piecewise linear curve (254 of them for each sign) that the piecewise linearity wouldn't be visible on a graph like this, and the proportion of the bit patterns corresponding to distinct real numbers is much higher (close to 255/256 rather than 7/8).

How different would the world be if we just used an exponential curve instead of a piecewise linear approximation of one? That is to say, what if instead of representing floating point number x as A*2B for some integers A and B, we represented it as 2A/k for some integer A and a constant integer k? In other words, we represent floating point numbers as the exponent of a fixed point fraction (hence, the name Fractional Exponent Floating Point or FEFP). Let's choose k such that the powers of two are the same as in the exponent/mantissa representation, i.e. k=4 in this case, and again list all the possibilities:

0 000 00 +0   1 000 00 -0
0 000 01 +0.1487   1 000 01 -0.1487
0 000 10 +0.1768   1 000 10 -0.1768
0 000 11 +0.2102   1 000 11 -0.2102
0 001 00 +0.25   1 001 00 -0.25
0 001 01 +0.2973   1 001 01 -0.2973
0 001 10 +0.3536   1 001 10 -0.3536
0 001 11 +0.4204   1 001 11 -0.4204
0 010 00 +0.5   1 010 00 -0.5
0 010 01 +0.5946   1 010 01 -0.5946
0 010 10 +0.7071   1 010 10 -0.7071
0 010 11 +0.8409   1 010 11 -0.8409
0 011 00 +1   1 011 00 -1
0 011 01 +1.1892   1 011 01 -1.1892
0 011 10 +1.4142   1 011 10 -1.4142
0 011 11 +1.6818   1 011 11 -1.6818
0 100 00 +2   1 100 00 -2
0 100 01 +2.3784   1 100 01 -2.3784
0 100 10 +2.8284   1 100 10 -2.8284
0 100 11 +3.3636   1 100 11 -3.3636
0 101 00 +4   1 101 00 -4
0 101 01 +4.7568   1 101 01 -4.7568
0 101 10 +5.6569   1 101 10 -5.6569
0 101 11 +6.7272   1 101 11 -6.7272
0 110 00 +8   1 110 00 -8
0 110 01 +9.5137   1 110 01 -9.5137
0 110 10 +11.3137   1 110 10 -11.3137
0 110 11 +13.4543   1 110 11 -13.4543
0 111 00 +16   1 111 00 -16
0 111 01 +SNaN   1 111 01 -SNaN
0 111 10 +QNaN   1 111 10 -QNaN
0 111 11 +Infinity   1 111 11 -Infinity

I moved the infinities to have "all 1s" representation, and just reserved 4 extra entries for NaNs (I'm not sure if anyone actually uses the facility of NaNs to store extra data, but a nice thing about this representation is that you can dedicate as few or many bit patterns to NaNs as you like).

A couple of improvements: our "minimum interval" between adjacent numbers is now about 1/36 instead of 1/16 and (because of the NaN tweak) we can represent slightly higher numbers.

In this representation, there's only one "denormal" (zero) for each sign. That can't fit the pattern however we cut it, since our "A" value never reaches minus infinity, however I suspect people would complain if they couldn't put zero in their floating point variables. This does mean that the interval between zero and the next highest number is larger than the minimum interval.

Would it be practical for CPUs to actually do computations with these numbers? Surprisingly, yes - though the trade-offs are different. Multiplication and division of FEFP numbers is just addition and subtraction (of the fixed point fractional exponent). Addition and subtraction are more complicated - you need to actually take the exponent of the representations, add or subtract them and then compute the logarithm to get the representation of the result. That sounds really hard, but using the CORDIC algorithms and some of the trickery currently used to add and subtract floating point numbers, I think additions and subtractions could be done in about the same time and using roughly the same number of transistors as multiplications and divisions of floating point numbers currently use. What's more, it might be possible to reuse some of the add/subtract logic to actually compute exponents and logarithms (which are themselves quite useful and often implemented in hardware anyway).

So I actually think that FEFP is superior to IEEE754 in several ways. The latter is sufficiently entrenched that I doubt that it could ever be replaced completely, but FEFP might find some use in special-purpose hardware applications where IEEE754 compatibility doesn't matter, such as signal processing or graphics acceleration.

Spam for buying instead of selling

September 19th, 2012

Most unsolicited commercial email tries to get its readers to buy something, but recently I had some spam that was the other way around - they wanted to give me money! Specifically, they wanted to buy ad space on one of my sites. Now, given that I don't have any costs involved in running the site in question (the hosting and domain name were donated), I don't feel I have the right to put ads on the site. I wouldn't want to anyway - it's a better site for not having ads, and it's not like the amount of money involved is likely to make the slightest bit of difference to the quality of my life. For the same reasons I don't have ads on this site (maybe if I could make enough money from the ads to give up my day job it would be a different story but none of my websites has anywhere near that amount of traffic!)

Even so, it took me a moment to decide whether to reply with a polite "no thank you" or to press the "report as spam" button. In the end I decided on the latter course of action - it's still unsolicited commercial email even if it's buying instead of selling. And one of the last things the internet needs is more advertising. It's kind of interesting though that advertisers (or ad brokers at least) are starting to court webmasters - it always seemed to me that the supply of advertising space on the internet vastly outstripped the demand, but maybe that's changing.

Similarly, my web host recently sent me a coupon for $100 of free Google ads - I have no use for them myself (I don't think anyone interested in what I have to say here is going to find it via an ad) but I hope I'll be able to donate them to someone who does have a use for them.

Recompiling only the code that has changed

September 18th, 2012

Designing an entirely new computer language is a ridiculously ambitious thing to attempt as a personal side project, but even so it apparently isn't ambitious enough on its own for me - oh no, I want to write a complete compiler for it as well.

Why would I go to all the trouble of doing that rather than just writing a GCC front end, starting with LLVM or even writing it as a translator to another language?

Well, the short answer is that I want to explore some parts of the design space of both compiler design and languages that I feel have so far gone under-explored. In other words, play around and have some fun.

I do have some specific ideas under that umbrella that I want to try out. Some of these ideas rely on having the entire compiler for my language written in the language itself.

Here is one specific example I've been vaguely thinking about. Suppose we have a compiler that translates ALFE code into an executable binary program. One problem with it is likely to be compilation speed - I'm eliminating separate compilation which is what makes incremental builds possible in C and C++. But I'd also like to have the edit-compile-test cycle be as quick as possible, even for large programs. For large programs, even C/C++ incremental builds are slow because the entire link step has to be done from scratch.

So how can ALFE, which doesn't even have separate compilation, have incremental build times even close to that of C/C++?

What I really need seems to be a daemonic compiler - one that hangs around in memory all the time and keeps all the state from the previous build so that instead of accepting a complete input file it accepts a diff between the input currently in memory and the desired input. The output could be a binary diff but for most of these edit-compile-test cycles we won't even care about the output binary - what we'll really want is a diff in the list of which testcases passed and which failed. It would be great if the compiler could figure out which testcases could possibly be affected by an input diff and only rerun those testcases. It might even be possible to make it sufficiently fast that that it could (asynchronously) recompile with each keystroke in a text editor, so you'll know instantly if your fix makes the broken test pass and if it breaks any other tests. That would be a fantastic boon to productivity - with the right set of tests, updates can be nearly instantaneous once the problem is found!

Such a system would improve reliability of software as well - if the edit-compile-test cycle is that much shorter then any manual testing would be a much greater fraction of it, so programmers would have a much greater incentive to make automated testcases for every bug fix, and test coverage would rapidly improve.

Writing a compiler is hard enough on its own, but taking a normal compiler and modifying it to transform diffs to diffs seems to be task of similarly daunting magnitude. But that's not necessarily the case if we have a compiler that is written in the language that it understands. Modifying the compiler to make it output a diff-to-diff compiler should, I think, be much easier than modifying the compiler to be a diff-to-diff compiler, since I wouldn't have to modify every part of the compiler - it would be a pass on the intermediate form that transformed transformations from f(x)->y form to f(x+dx)->y+dy form. Of course, that's still much easier said than done, but I think it takes it from the realm of the impossible to the realm of the merely very difficult.

Using one scrolling window inside another is annoying

September 17th, 2012

I'm very impressed with WordPress as a piece of software in general, and particularly at how it's managed to implement a text editor inside a web browser. However, as a text editor it's kind of annoying. It doesn't work terribly well in Internet Explorer for one thing. Even on Firefox and Chrome, it suffers from "one scrolling window inside another" syndrome which is common on many complicated "web applications". Once you've got a got more than a screenful of text in the edit box, you get an inner scrollbar to scroll it up and down. Being a keyboard-oriented sort of person, I usually use the PgUp and PgDn keys for this. PgDn works fine on the inner edit box until you get to the bottom, at which point it stops swallowing the PgDn keypresses and passes them on to the outer window, which then scrolls the top of the edit box off the window. I can't undo this with PgUp because that goes to the inner edit box! This behavior breaks the idea that PgUp and PgDn should do opposite things.

One way to fix this would be to make the edit box expand to fit the text instead of scrolling. There's probably a plugin for this (I've been able to solve just about every other WordPress problem I've had by installing a plugin) but I have no idea how to find it - the obvious searches didn't help. So most of the time I write blog posts in my preferred editor and paste them in. However, I still end up doing a pass over the text in WordPress to fix up the line breaks.

C++ features ALFE doesn't have

September 16th, 2012

ALFE's closest ancestor is probably C++ (or possibly D, though I don't have much experience of that language). Though it also draws inspiration from such disparate corners as C#, PHP and Haskell. But there are several C++ features I'm planning on leaving out of ALFE:

  • C compatibility - this is the cause of a great many of the complexities of C++, though is quite probably also the only reason why C++ has been as successful as it has.
  • Separate compilation - another huge source of complexities in C++. The ALFE compiler will have a built in linker, so it will be able to generate an executable binary in a single instantiation. Actually I'm not entirely getting rid of separate compilation - you'll still be able to create and consume object files, but doing so will require explicitly specifying the ABI to use. The problems that are solved by separate compilation I'm planning to solve in different ways, which I'll write about in future posts.
  • Declarations. Declaring something separately from its definition is useful if you have separate compilation, or if you're writing a compiler that can compile programs too big to fit in RAM. RAM is pretty cheap nowadays so this isn't a particularly important use case - certainly not important enough to make programmers duplicate information and have an extra place to modify things when they change. The only things which look like declarations in ALFE will be for things like calling a function declared in an external binary, such as for platform ABI calls. I go to great lengths to minimize the number of declarations in my C++ programs.
  • The preprocessor. Well, macros specifically (anything you might want to use macros for in C or C++ will be done better with features of the core language in ALFE). Header files will still exist but will have rather different semantics. In particular, they won't be included until the code in them is needed - if you put an include statement inside a class definition and then no instance of that class is ever used, the file won't even be opened. That should drastically improve the time it takes to compile files while making it very simple to use the standard library (whose classes will all be included this way by default).
  • The post-increment and post-decrement operators. These do have some merit in C - you can write things like "while (*p != 0) *q++ = *p++;" much more concisely than would otherwise be possible. The first chapter of Beautiful Code has a nice example. However, I don't think these bits of code should be the ones we're making as concise as possible - such a piece of code should be written once, wrapped up as a highly optimized library function and thoroughly documented.
  • Digraphs and trigraphs. You'll need the full ASCII character set to write ALFE, which is probably not much of a hardship any more.
  • Unexpected exceptions (and run-time exception specifications). Exception specifications are a great idea as a compile time check but they make no sense as a run-time check, which is why most coding standards recommend against using them. But the reason that C++ needs exception specifications to be a run-time check instead of a compile-time check is because of separate compilation - there's no way to tell what what exceptions will actually be propagated from any particular function. With compile time exception specifications it's trivial to force the exception specification for destructors to be "nothrow" and thereby make propagating an error from a destructor be a compiler error, which is a much better solution than the "unexpected exception" mechanism in C++.
  • Multiple inheritance. I gave this one a lot of thought before deciding on it. Multiple implementation inheritance is something that seems to get used very little in proportion to the amount by which is complicates the language. I've only seen one even slightly compelling example (the BBwindow etc. classes in chapter 12 of The C++ Programming Language by Stroustoup) and the rationale for that goes away when you have control over all the parts of the program (i.e. you're not consuming code that you can't modify). ALFE actually goes one step further and disallows multiple interface implementation as well on the grounds that if you have a single class implementing more than one interface, it's a violation of the separation of concerns. From an implementation point of view, inheritance is a special case of composition - the parent object is just in the first position. That suggests a good way to simulate multiple interface inheritance - compose the class out of inner classes which implement the various interfaces - pointers to these inner classes are retrieved via method calls rather than casts. I might even add some syntax to streamline this if it gets too verbose. And I might still change my mind and introduce multiple (interface) inheritance if my plan doesn't work.
  • The main() function. Instead the statements in the file passed to the compiler are executed in order from the top - i.e. there is an implicit "main()" wrapped around the entire compilation unit. Command line arguments and environment variables are accessed via library functions (or possibly global variables), which means they don't have to be passed through the call chain from main() to whatever piece of code interprets the command line arguments. That greatly reduces the amount of boilerplate code.
  • public/private/protected/friend. Instead I'm planning to implement "accessibility sections", which are kind of like public/private/protected but specify which classes are allowed to access the members in the section. That way you don't have to give all your friends access to all your private parts, which is possibly the most innuendo-laden complaint about C++.
  • int to bool conversion. The condition in "while" and "if" statements must have boolean type. If you want C semantics you have to add "!= 0" explicitly. I've been doing this anyway in my own C/C++ code and I don't find it to be a source of excess verbosity - if anything it clarifies my intents.
  • Assignments (and increment/decrement) as expressions. This solves the "if (a=b)" problem that C and C++ have, in a far more satisfactory way than the "if (1==b)" abomination that some recommend. I've also been avoiding using assigments as expressions in my C and C++ code and I don't miss them - I think the code I write is clearer without them.
  • C strings (responsible for so many security bugs). ALFE will have a first-class String type, possibly consisting of a pointer to a reference counted buffer, an offset into that buffer and a length, but I'll probably try a few things to see what performs best, and the compiler might even decide to use different representations in different places if there's a performance advantage in doing so.
  • References. I can see why these were necessary in C++ but I think for ALFE they will be unnecessary because of the lack of a fixed ABI, and the "as if" rule (the compiler is allowed to use whatever implementation strategy it likes as long as the semantics don't change). So the rule for deciding how to pass an argument is simple: pass by value if it's strictly an input, pass by pointer if it the function might need to change the output. The compiler will decide whether to actually pass by value or pointer depending on what is fastest on the target architecture for the kind of object being passed.
  • Array to pointer decay. In ALFE, fixed-length arrays are first class types, so passing one to a function declared as (e.g.) "Int foo(Int[4] x)" passes by value, i.e. copies the entire array. Getting a pointer to the first element requires explicitly taking its address: "Int* p = &x[0];"

ALFE design philosophies

September 15th, 2012

Every computer language has its design philosophies. Here are ALFEs:

  • Imperative rather than purely functional. This, this and this have some good points about why functional programming isn't a panacea. However, I'm hoping to implement some features which help to keep the purely functional parts of a program separate from the parts with side effects - the compiler will be able to keep track of which functions are supposed to be pure and prevent them from calling impure functions (with appropriate escapes for things like debugging and encapsulating high performance libraries behind functional interfaces).
  • Statically typed rather than dynamically typed. A type annotation is like a compiler-enforced comment and a unit test rolled into one, at an almost trivial cost. In fact, being used to static typing I now find that when I'm programming in a dynamic language I spend more time wondering what values a variable is supposed to have and tracking down errors that the type system would have caught than I do typing and modifying type annotations in static languages. Plus they're not even compulsory - I'm planning to give ALFE a Variant type which can hold any value (although most of the standard library functions will probably be statically typed so dynamic typing won't have first-class status). Some form of Hindley-Milner style type inference is also something I'd like to do.
  • Strict rather than lazy evaluation. ALFE is a high performance language and general performance characteristics are considered to be an observable behavior. Of course, lazy evaluation as a strategy for particular pieces of code is still possible - I remember when I first understood lazy evaluation reading about strategies for implementing continued fraction arithmetic in HAKMEM and thought it was tremendously clever, though unfortunately I've never had a good reason for implementing it.
  • No garbage collection. Also, the space of garbage collected languages has also been pretty thoroughly explored.
  • Rich in features at the expense of simplicity and ease of learning. In fact, if this language is so complicated that I'm the only one who can use it, that would not in itself cause it to fail to achieve what I'd like to achieve with it.
  • It's primarily a compiled language for maximum speed. Though I am currently planning on having a built in interpreter for the build system. A REPL might come in handy too.
  • The core language is not minimalist - even if some feature could be added as a library, it may still get language support if that would improve it in some way.
  • Text is for input and output, not processing.

I might come back and add some more things to this list as I think of them.

Unity language renamed to ALFE

September 14th, 2012

As regular readers of this blog know, I have a long-term on-and-off project to design my own computer language in which I can be super productive at all parts of the development cycle. I have occasionally referred to this language as Unity (given my hope that it will be useful for many purposes), but since picking that name I have learned of several other software projects by that name (including a another programming language), so I've renamed my project to ALFE. I've updated all the references I could find (including the URLs, which I didn't think I'd be able to do - WordPress has a plugin for everything). If you spot any more lurking in the dark corners of the blog please let me know.

ALFE stands for "A Language For Everything". Which has more than one meaning. One is that ALFE itself can be used for any purpose. Another is that (via the built in parser generator and code generator) you can use it to create powerful domain-specific languages.

ALFE is also the name of the standard library that is included with the language, though when it's referring to the library, it stands for "A Library For Everything".

Removing outliers

October 17th, 2011

If you've got a sequence of data that you want to do some statistical analysis on, but you know that some of it is bad, how do you remove the bad data? You could just remove the top 5% and bottom 5% of values, for example, but maybe you don't have any bad data (or maybe you have lots) and that would adverse affect your measurement of the standard deviation.

Suppose that you know that your dataset is supposed to follow a normal distribution (lots do). Then you could remove outliers by measuring the skew and kurtosis (3rd and 4th moments), and just repeatedly remove the sample furthest from the mean until these measurements look correct. This algorithm is guaranteed to terminate since if you only have 2 samples the skew and kurtosis will be 0. You've still got a parameter or two to tune though (how much skew and kurtosis you'll tolerate).

If the world were rational

October 16th, 2011

Sometimes I wonder what the world would be like if it were possible to get everybody to agree on true things.

Suppose that, whenever two people find that they disagree about something, they invoke the Argulator algorithm to resolve the dispute. The Argulator algorithm is trivial for statements which empirically true or false - you just go and do the experiment to find out who is right. For other statements, each side enumerates each argument for their position as a set of statements, then figure out which statements in each of their opponent's arguments they disagree with and recursively invoke the Argulator algorithm on each of those.

The Argulator algorithm isn't guaranteed to terminate, though - you might have two positions which are perfectly internally consistent and consistent with the observable universe, but which still disagree with each other. For example, two people might disagree about the purpose of government - one might argue that its purpose is to provide a safety net or basic minimum standard of living for the least well off members of society. Another might argue that its purpose is to further the cause of human achievement (maximize rate of total long-term economic output) by any means necessary. Without agreement on what the end goal is, the chances of agreeing of a course of action are pretty slim.