If one were to design a computer specifically for the purpose of having Oldskool-style demos written for it, what would it be like?
I came up with a few requirements:
- Oldskool "feel"
- Attractive output: sampled sound and realistic looking (still) images can be achieved very easily
- The more skilled the programmer, the better effects can be achieved - very great skill is required for the best effects
- Hardware as simple as possible
- Easy to emulate
- Fun to program
To get that Oldskool feel but still allow realistic looking still images, I decided to base it around a composite NTSC signal. It turns out there is a standard for digital composite NTSC signals - SMPTE 244M, so I just picked the 8-bit version of that. The video hardware is about as simple as video hardware could possibly be - just an 8-bit DAC that repeatedly cycles through a consecutive region of memory and dumps the bytes to the output at a frequency of 14.318MHz (4 times the color carrier frequency). This is also the CPU clock speed (the CPU is in sync with the pixel clock).
Technically, a still image in this format that is fully NTSC compliant requires 955,500 bytes (933Kb) of memory (2 frames*910*525) because the color carrier phase alternates between frames. Given that the machine's memory size should be a power of 2 (so that we can map 32-bit addresses to physical addresses just by clearing the top bits) I decided to give it 1Mb (any more and it stops being so Oldskool). This doesn't necessarily mean that you need to use almost all the memory to store the framebuffer - by tweaking the burst/sync patterns you can get a 912*525 (468Kb) interlaced mode where you only need one frame, or a 912*262 (233Kb) non-interlaced mode (again with only frame), both of which should be compatible with any composite monitor. These have usable resolutions (including the overscan area) of about 720*480 and 720*240 respectively (with respective CGA-esque pixel aspect ratios of 5:6 and 5:12). So this gives a nice bit of flexibility to software with no extra hardware complexity.
One disadvantage of this setup is that it may be possible for software to damage some composite monitors by creating sync pulses too often. So one would probably want to use an emulator when testing/debugging programs! Also, scrolling is fiddly because you have to move the image data independently of the sync/burst pulses. There are block move instructions which can move 4 pixels per clock to help with this, though.
Audio is done in a very similar way to video - set the beginning and end of the region of memory and the hardware cycles through that memory range turning bytes into samples and putting them through a DAC. Another register is used to set the sample rate (I decided a fixed rate would be too inflexible).
Programs are just 1Mb memory images which are dumped into RAM. The first two words of RAM are the instruction pointer and the stack pointer, so execution starts at the address pointed to by the word at 0. I/O registers are also at the start of the memory region.
Most instructions are 1 cycle long (with the exception of instructions like the equivalents of "REP MOVSB" and "REP STOSB") to enable effects that require cycle counting to be done as easily as possible. Most instructions are 1 byte long (with the exception of instructions that have a 1-byte, 2-byte or 4-byte immediate argument). The CPU is 32-bit (to make addressing that 1Mb of memory easy). The architecture is stack machine based (kind of like Java) - I have a soft spot for such architectures because they're really easy to write decent compilers for (registers are helpful at the hardware level for making processors fast, but that isn't a particular concern here). Devolving the CPU has some ideas for generating a good instruction set.
Because of the cycle counting and simplicity requirements, there are no hardware interrupts - all timing must be done by cycle counting. This also means that there are no instructions whose purpose is to determine some value N and take O(N) time to do it - so no "REP SCASB" in this architecture. I don't think that is very useful for demos anyway, and it's a non-goal of this design to be suitable for general purpose computing tasks like searching and parsing.
It would be pretty easy to generalize this architecture to make a games machine - just have a register whose bits reflect the up/down status of controller buttons.
Now, while I'd like to believe that building such a machine would revitalize the Oldskool demo scene by providing a new focal point, I suspect a more realistic point of view is that because programming such a machine would be such a non-applicable skill, because the limitations of the machine constrain the possible effects, and because there would be none of the nostalgia inspired by actual Oldskool hardware, nobody would actually be interested. But it's still an interesting exercise!