Archive for the ‘computer’ Category

UDRE doesn't do what I thought it did

Friday, October 7th, 2011

I was having a very frustrating problem trying to make my Arduino behave as the keyboard for my XT - one particular piece of code (a 1ms delay loop) seemed to be taking much longer to run on the Arduino than it should, causing the Arduino to think that the clock low pulse from the XT was shorter than it was, causing it to fail to recognize the "reset keyboard" command.

After much hair-pulling, I tracked it down to the UDRE (UART (Univeral Asynchronous Receiver/Transmitter) Data Register Empty) interrupt, which fires to tell the program that a space has been cleared in the UART transmission buffer so the program can queue up the next byte that needs to be sent over serial (which is usually nothing, since the Arduino doesn't need to send anything to the host PC at that point). Or at least, that's what I thought it did - however, it turns out that it's not "a space has been cleared" but "a space is available" - i.e. it's level triggered, not edge triggered. That means that if we don't put anything out onto the serial line when the interrupt is received, it'll just fire again as soon as interrupts are turned on again. That explains why my 1ms delay loop was taking 20ms - it was spending most of its time servicing spurious interrupts instead of counting down. The fix was simple, just use the "transmit complete" interrupt instead of UDRE.

One reason that this was so hard to debug was that every time I tried to add code to debug it (by sending debugging information over the serial line) it went away (which is obvious in retrospect - if the serial line is busy most of the time, it doesn't interrupt so much). Sometimes thinking hard about what the problem could be is actually quicker than doing experiments to debug a problem.

Having found that, I realized that I had written some other code which assumes the UDRE was edge-triggered - the code for my Physical Tone Matrix. I made the same change there and now the menu screen appears and disappears instantly rather than sweeping down the screen like a DOS directory listing on an 8088. Unfortunately that means that the menu switch is now too sensitive - it flickers on an off while you're pressing the switch so you sometimes have to press it a few times to get it into the state you want. I'll change the capacitor for a larger one next time I have it open.

Subpixel image resampling results

Tuesday, October 4th, 2011

I implemented my cleartype for images idea and here are the results.




(This image, by the way, is part of the schematic of the original IBM CGA card - specifically, the composite DAC stage which I've mentioned before.)

The first image was resampled by Paint Shop Pro 4.12 (I know it's ancient, but I never got on with any of the newer versions). This resampling algorithm works directly on the sRGB values (it's not gamma corrected) so whenever there's a grey pixel in the output image, it's too dark.

The second image was resampled using proper sRGB<->linear conversion and an ideal sinc filter (which means it's very slow - several seconds per image). If you zoom into this, you'll notice some "ripples" around the horizontal and vertical lines which are due to the band-limited resampling - getting rid of them would require higher spatial frequenices than this image can accurately reproduce, so really the image is about as correct as it can be. Because the ripples are clipped on the high side, this does make the image slightly darker on average than it should be, but the difference isn't noticable to the human eye).

The third image was resampled by the "cleartype for images" algorithm (again with a sinc filter in linear space) using a band-limit of the subpixel resolution. As you can see, it's noticably sharper but does have annoying "fringes" of colour as thin lines move from one subpixel to another.

The fourth image is the same as the third except that the band-limit is that of the pixel resolution rather than that of the subpixel resolution (but the filters are still centered on the subpixels for the corresponding channels). This completely eliminates the fringing (yay!). As expected, it's not actually sharper than the non-cleartype image but some of the near vertical lines are smoother and the text is much more legible than any of the other versions.

Here's the same four methods with a colour image:




First let's look at the one with incorrect gamma - notice that a lot of the sharpness in the detailed regions is gone.

The second image is much better - the details remain sharp.

The third image is a mess - details exhibit aliasing problems because when the image has detail in just one channel, the band-limit frequency is three times the pixel frequency.

The fourth image is pretty similar to the second but there are a few places where sharp, near-vertical lines are a bit smoother, for example the brightly lit part on the left of the hat.

So, I think this experiment is a success. The downside is that I now need to go back through all my previous blog posts and resample the images again to improve them.

What is the CGA aspect ratio exactly?

Monday, October 3rd, 2011

Somebody asked on the Vintage Computer Forums about what the CGA aspect ratio is supposed to be. The answer is usually given as 4:3 (pixel aspect ratio of 5:6), but I was inspired me to find out what the relevant standards say it ought to be, exactly.

The relevant standard in question is SMPTE 170M - composite analogue video signal (upon which CGA is based). This gives an aspect ratio of 4:3, but that is for the full composite picture which 242.5 lines rather than the CGA's 200. The width is given in terms of timings - 63.556 microseconds per scanline total minus 1.5+9.2 microseconds for the blanking period, with a tolerance of +0.3/-0.2 microseconds, so between 52.556 and 53.056 microseconds altogether. Since the full horizontal period consists of 455 CGA low-res pixels horizontally, the full NTSC active area is the equivalent of (376.25-379.83)x242.5 CGA pixels. Re-arranging, that gives us a screen aspect ratio for CGA of between 1.362 and 1.375 - slightly wider than the usually quoted value.

However, no TV or composite monitor of the time was manufactured to have aspect ratio tolerances as precise as 3% - 4:3 would have been well inside the error bars.

How to set 200-line text modes on EGA

Sunday, October 2nd, 2011

If you tell an EGA card that it's connected to a monitor capable of 350-line modes by setting the appropriate switches on the card itself, it will by default use these 350-line modes for its text mode (using the 14-scanline character set instead of the 8-scanline one, yielding higher fidelity text).

But sometimes you want a 200-line text mode. In particular, there is an obscure 160x100 16 colour mode of the CGA which was obtained by using 80-column text mode, filling the text characters with the "left vertical bar" or "right vertical bar" characters (221 and 222 repectively), disabling blinking and setting up the CRTC for 100 visible rows of 2-scanline characters. This was used by some Windmill Software games and a few others. With VGA you can do the same thing with 400-line text mode (by using 100 visible rows of 4-scanline characters).

But how do you do it with EGA? One way is to program all the registers directly (as one must do on CGA) using the timings, sync polarity and palette from 200-line graphics modes and all other settings from the 350-line text modes. As mentioned in yesterday's post, we must use the palette 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17 instead of the usual 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x14, 0x07, 0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f because the monitor will interpret bit 4 as intensity instead of secondary green in 200-line modes.

Another way (which may be either more or less compatible with clone EGA cards) is to fool the BIOS into thinking a 200-line monitor is connected. The EGA BIOS reads the card's switches only once at boot time and then stores the results in BIOS memory at 0x40:0x88 and uses this value instead of the hardware value at mode-setting time. The low nybble of the value at this location is 3 or 9 for 350-line monitors and the corresponding values for 200-line monitors are 2 and 8 respectively. So an alternate algorithm is to check the byte at this address, decrement the low nybble if it's 3 or 9, store it back, do the "int 0x10" to set text mode, set the Maximum Scan Line Register to 1, disable blink and fill the text characters with a vertical bar.

Here is the Vintage Computer Forum thread that inspired me to find this out.

Why the EGA can only use 16 of its 64 colours in 200-line modes

Saturday, October 1st, 2011

This was a question which puzzled me when I first found out about it, but now that I understand all the history behind it, it makes perfect sense.

The IBM PC (5150) was originally designed output to an NTSC television in mind - hence the 4.77MHz clock speed (4/3 the NTSC color carrier frequency - allowing the video output and CPU clock to share a crystal). It was thought that home users would generally hook their PCs up the TV rather than having a separate, expensive, monitor. Another major limiting factor in the design of the CGA was the price of video memory - the 16Kb on the card would have been fairly expensive at the time (it was as much as the amount of main memory in the entry level PC). TV resolution is 912x262 at CGA 2-colour pixel sizes in non-interlaced mode, but TVs (especially CRTs) don't show all of the image - some of those scanlines and pixels are devoted to sync signals and others are cropped out because they would be distorted due to the difficulties of approximating high frequency sawtooth waves with high-voltage analog circuitry. So 320x200 4-colour and 640x200 2-colour packed pixel modes were chosen because they were a good fit for both 16Kb of memory and TV resolutions.

That system did work quite well for many home users - lots of CGA games have 16-colour composite output modes. But it wasn't so good for business users. These users tended not to care so much about colour but did care about having lots of columns of text - 80 was a common standard for interfacing with mainframes and for printed documents. But 80-column text on a TV or composite monitor is almost completely illegible, especially for colour images - alternating columns of black and white pixels in a mode with 320 pixels horizontally gets turned into a solid colour by NTSC. So for business users, IBM developed a completely separate video standard - MDA. This was a much simpler monochrome text device with 4Kb of memory - enough for 80 columns by 25 rows of text. To display high quality text, IBM decided on a completely different video standard - 370 scanlines (350 active) by 882 pixels (720 active) at 50Hz, yielding a 9x14 pixel grid for high-fidelity (for the time) character rendering. In terms of timings, the character clock is similar (but not identical) to that of the CGA 80-column text mode (presumably 16.257MHz crystals were the closest they could source to a design target of 16.108MHz). To further emphasize the business target of the MDA card, the printer port was built into the same card (a printer would have been de-rigour for a business user but a rare luxury for a home user). Business users would also usually have purchased an IBM 5151 (green-screen monitor designed for use with MDA) and IBM 5152 (printer).

CGA also had a digital TTL output for displaying high quality 16-colour 80-column text (at a lower resolution than MDA) on specially designed monitors such as the IBM 5153 - this seems to have been much more popular than the composite output option over the lifetime of these machines. The two cards used different memory and IO addresses, so could coexist in the same machine - real power users would have had two monitors, one for CGA and one for MDA (and maybe even a composite monitor as well for games which preferred that mode). The 9-pin digital connectors for CGA and MDA were physically identical and used the same pins for ground (1 and 2), secondary intensity (7), horizontal sync (8) and vertical sync (9) but CGA used 3, 4 and 5 for primary red, primary green and primary blue respectively whereas MDA used pin 7 for its primary video signal. MDA also used a negative-going pulse to indicate vertical sync while the CGA's vertical sync pulse is positive-going.

So for a while these two incompatible standards coexisted. The next major graphics standard IBM designed was the EGA, and one of the major design goals for this card was to be an upgrade path for both home and business users that did not require them to buy a new monitor - i.e. it should be compatible with both CGA and MDA monitors. This was accomplished by putting a 16.257MHz crystal on the card and having a register bit to select whether that or the 14.318MHz one would be used for the pixel clock (and by having the on-board video BIOS ROM program the CRTC appropriately). By 1984, it was not out of the question to put 128Kb of RAM on a video card, though a cheaper 64Kb option was also available. 64Kb was enough to allow the highest CGA resolution (640x200) with each pixel being able to display any of the CGA's 16 colours - these would have been the best possible images that CGA monitors such as the IBM 5153 could display. It was also enough for 4 colours at the higher 640x350 resolution - allowing graphics on MDA monitors. With 128Kb you got the best of both worlds - 16 colours (from a palette of 64) at 640x350.

IBM made a special monitor (the 5154) for use with the EGA. This monitor could display both 200-line and 350-line images (deciding which to use by examining the vertical sync pulse polarity), and allowed users would be able to take advantage of all 64 colours available in 350-line modes. The video connector was again physically the same and pins 1, 3, 4, 5, 8 and 9 had identical functions, but pins 2, 6 and 7 were repurposed as secondary red, green and blue signals respectively, allowing all 64 possible colours. But they wanted this monitor to be compatible with CGA cards as well, which meant that in 200 line mode it needed to interpret pins 3-6 as RGBI instead of RGBg and ignore pins 2 and 7. So even with a 5154, the EGA needed to generate a 4-bit signal when connected to a CGA monitor, disabling pins 2 and 7.

I guess the designers thought that sacrificing 48 of EGA's colours in 200-line modes was a small price to pay for making the EGA monitor compatible with CGA cards. Presumably they thought that if you had an EGA card and an EGA monitor you would be using 350-line mode anyway, or be running legacy CGA software which wouldn't miss those extra colours.

One thing I haven't mentioned here is the PCjr graphics. For the purposes of the discussion above it's essentially the same as CGA (it has the same outputs) but it's more flexible and slower due to the use of system RAM as video RAM, as many 8-bit microcomputers did in the 80s.

No exceptions

Thursday, September 29th, 2011

Recently I was out riding my bike and I saw a car sporting this bumper sticker:
God Bless Everyone - No Exceptions!
All I could think was that the owner of that car must really like to use return codes for their error handling.

I bought an XT

Wednesday, September 28th, 2011

For a while now I've been wanting to write a cycle-exact emulator for the original IBM PC and XT machines that are the direct ancestors of the machine I'm writing this on (and, almost certainly, the machine you're reading this on). I've written an 8088 emulator with cycle timings that I think are plausible but I have no idea if they're correct or not. The exact details of the timings don't seem to be published anywhere (at least not on the internet) so the only way to determine if they are correct is to compare against real hardware.

So, when I saw a cheap XT for sale on eBay recently, I made a spur-of-the-moment bid, and won it. It is a bit beaten up - the case was bent in one place, the speaker cone was falling off and the end of one the ISA slots was broken off. All these problems were easy to fix with a hammer and a bit of superglue, though. It lacks keyboard and monitor, which makes it rather difficult to do anything useful with it, but it did come with all sort of weird and wonderful cards:

  • Hyundai E40080004 MDA card with 66Kb of RAM. It's all discrete logic apart from the CRTC, RAM and a ROM which I think is a character ROM rather than a BIOS ROM (though I could be wrong). The amount of memory makes me suspect it can do graphics, but I can't find any documentation - I'd probably have to reverse engineer a schematic to find out exactly what it's capable of.
  • Tecmar Captain 200044 card with 384Kb RAM (bringing total to 640Kb), serial port, connector for a parallel port and possibly an RTC (it has a CR2032 battery on it anyway).
  • AST 5251/11 - apparently this enables PCs to connect, via Twinax, to a 5251 terminal server. It has the largest DIP packaged IC I've ever seen - a Signetics N8X305N RISC Microprocessor running at 8MHz (16MHz crystal) with 6KB of ROM and 96Kb16KB of SRAM (12KB on a daughterboard) for its own purposes.
  • A generic serial+parallel card.
  • IBM floppy and MFM hard drive controller cards.
  • IBM serial card.
  • PGS scan doubler II. It seems to be pretty rare, there's only one mention of it on Google and I've never heard of such a device being used with PCs before (though I understand they were more popular in the Amiga-verse). It's only uses the power and clock lines from the ISA bus - it has two CGA-style ports on the back, one for input and one for output. You loop the CGA card's output back into one of the ports, and the other one outputs a signal with twice the line rate (it buffers each line in its internal 2Kb RAM and outputs each one twice, at double the pixel rate). I'm guessing the output had to be a monitor with a CGA input which could run at VGA frequencies, which can't have been all that common.

It also comes with a floppy drive and a hard drive which makes the most horrendous metal-on-metal grinding sounds when it spins up and down (so I'll be very surprised if it still works).

This machine has clearly been a real working PC for someone rather than a perfectly preserved museum piece - it tells stories. At some point the original MDA card failed and was replaced with a clone, the RAM was upgraded, it's been used as a terminal for a mainframe, and at some point the owner read about this device that improves your video output, bought one, found it didn't work with his MDA card but just left it in there.

Due to lack of keyboard and monitor (neither my TV nor my VGA monitor can sync to MDA frequencies) I haven't got it to boot yet. I tried to use the scan doubler with the MDA and my VGA monitor (connecting the mono video signal via the red channel of the doubler) and the monitor was able to sync but the output was garbage - I guess the doubler is either broken or only works with CGA frequencies. If it does work with CGA then it'll be useful for seeing the TTL CGA output (though I'll have to put something together to convert the RGBI digital signals to analogue RGB - with proper fix up for colour 6 of course).

I ordered a CGA card but decided to see if I could jerry-rig something up in the meantime. I programmed my Arduino to pretend to be an XT keyboard and also the "manufacturing test device" that IBM used in their factories to load code onto the machine during early stage POST (it works by returning 65H instead of AAH in response to a keyboard reset). I then used this to reprogram the CRTC of the MDA to CGA frequencies (113 characters of 9 pixels at 16MHz pixel clock for 18 rows (14 displayed) of 14-scanline characters plus an extra 10 scanlines for a total of 262 scanlines). The sources for this are on github.

Next, I connected the hsync, vsync, video and intensity lines to a composite output stage like the one in the CGA card (it's just a transistor and a few resistors) and put some junk in video memory. Amazingly, it works - there is a barely legible picture. Even more amazingly, it is in colour! On the same breadboard I was doing this on, I had a Colpitts oscillator driving a 3.5795MHz (NTSC colour burst frequency) crystal from an earlier experiment (before my XT arrived I was trying to get colour TV output from my Arduino). This oscillator wasn't connected to anything, but the very fact that that frequency was bouncing around nearby was enough to turn off the colour killer circuit in the TV and allow it to interpret certain horizontal pixel patterns as colours.

The colours themselves aren't actually related to the picture in any useful way - the hsync and crystal aren't in sync so the phase (and hence the hues) drift over time. In fact, by applying slight heat and/or pressure to the crystal with my fingers, I can change the frequency slightly and make the hue phase drift rate faster or slower with respect to the frame rate, and even stop it altogether (though it's still not a multiple of the line rate so the colours form diagonal patterns).

The sync isn't quite right - because the hsync and vsync signals are just added together the TV loses horizontal sync for 16 lines or so during vsync and then spends half the picture trying to recover. Unfortunately the CRTC in the MDA card has a vertical sync pulse height fixed at 16 lines but it needs to be closer to 3 for NTSC so I haven't been able to get a stable signal even by XORing the signals like the CGA does. The CGA uses a 74LS175 to get a 3-line sync signal, but I don't have one of these to hand.

Here's the schematic for the circuit as I would have made it if I had the parts:

Unfortunately I haven't been able to continue the BIOS POST sequence after running this code - I tried jumping back into the BIOS at a couple of places but it just froze. I'll have to tinker with it some more to see if I can get it to work and determine where the POST is failing next.

I've determined that it should be possible for the XT to send data back over the keyboard line (the clock line is controllable by software). So I'm planning to do bidirectional communication between the XT and a host PC entirely over the keyboard port! I'm writing a tiny little OS kernel that will download a program over the keyboard port, run it, send the results back and then wait for another one.

Unfortunately my plans have been derailed because the power supply unit has failed. I noticed that the PSU fan wasn't spinning - I think that has caused some parts in the PSU to overheat. One resistor in there looked very burnt and the resistors for discharging the high-voltage smoothing capacitors were completely open circuit. I replaced all these but it's still not working. I've ordered a cheap ATX power supply and will be transplanting the guts into the XT PSU's box so that I can keep the big red switch.

Getting rid of GOTOs

Monday, September 26th, 2011

It's well known that it's possible to change a program that uses GOTOs into one that uses loops and conditionals if you're allowed to duplicate code or add more variables. One very easy way is just to change the program into a state machine, with one state for each label and a big loop around the whole thing. But under what conditions can you eliminate GOTOs if you can't do either?

The "Multiple break" example in the "More keywords" blog post is a piece of code that can't be de-GOTOized in C. I'm wondering if there are any examples of pieces of code that can't be de-GOTOized without multiple break, extra variables or code duplication. If I understand the abstract of Peterson 1973 correctly I think there aren't - that multiple break is enough to make GOTO redundant (but I can't find a non-paywalled copy of that article to check, and I don't care enough to spend $15 to find out).

Using templates when you don't really need to

Sunday, September 25th, 2011

Templated C++ classes are a bit nicer to use than non-templated classes, because one can define the entire class within its declaration without having to make sure that the types it uses are defined first (this check is done at instantiation time for template classes, but at parse time for non-template classes). I have found myself making classes templates when they don't really need to be just so that I don't have to define member functions outside the declaration - i.e. when CTemplate doesn't actually use T for anything, and CTemplate is just used as "typedef CTemplate C;". T may be used as the template parameter to other classes defined this way, though.

Is Haskell too abstract?

Saturday, September 24th, 2011

The Haskell programming language implements a lot of very cool ideas (many of which I want to liberate for my programming language). However, Haskell seems to have a reputation for being a very difficult language to learn. The IO monad seems to be one particular sticking point, but this didn't seem to be particularly difficult to me - it's just a clever little hack (I just read lots of "How I came to understand the IO monad" accounts until I got it).

Another sticking point seems to be there's a lot of stuff written about Haskell in the form of academic papers (with all the academic jargon that entails). That shouldn't be a surprise (since it's a language with origins in academia which seems to be only reluctantly escaping to industry), but it is kind of interesting that there's a sort of language barrier between academia and industry - what the former calls "deforestation hylomorphism" might be called "tree flattening" by the latter, for example.

But I think the real problem is something shared by other functional languages - it's just a different level of abstraction than we're used to thinking about. In imperative programming languages programmers can think "okay, what do I want the machine to do here?" and write that code. In functional programming languages one instead has to think about what the inputs and outputs are, write functions to perform those transformations and trust that the compiler will optimize it well (to a first approximation). It's much more like doing mathematics than imperative programming. Compilers will do all sorts of crazy things to turn those pure functions into high-performance imperative code. So when things are inevitably too slow, it's not obvious why (since the relationship between the generated code and the source code is very complicated) and it's difficult to understand how to make it faster.

Functional programming languages do have some interesting advantages over imperative languages, though - it's much easier to reason about pure functions than about imperative programs with all their attendant state, which means lots more interesting automatic optimizations are possible and (increasingly importantly) the code can be automatically parallelized so it takes maximum advantage of modern multi-core CPUs.

I'm not sure which side to place my money on. I suspect both imperative and functional paradigms will continue to exist for the foreseeable future. On the imperative side: the low-level OS and runtime components must can't be written using functional languages, and optimizations and multi-core abstractions for imperative languages will continue to improve. On the functional side, compilers will require less and less hand-holding to generate good code and will generate better code than imperative compilers for more and more situations.