You might know that according to the NTSC standard for TV signals in the USA, the display is updated 60 times per second. But did you know that it's not exactly 60 - it's actually closer to 59.94 (60000/1001, to be exact). This Wikipedia article explains why.
Early microcomputers (including early PCs), which were designed to be plugged into TVs, didn't have exactly the same frame rate as TV pictures - most of them used 1640625/27379 (which is closer to 59.92) instead because it was slightly easier to build that way (fortunately TVs have enough tolerance to display the slightly out-of-spec pictures correctly). I wrote this to explain (amongst other things) the origins of these "magic" numbers.
This 59.92Hz number turned out to be very important for finding an obscure bug in California Games that I was hitting whilst trying to get the "CGA MORE-color mode" working on MESS. There is a routine to determine if the frame rate is close enough to 60Hz for it to be likely that this effect would work. The routine seems to be trying to determine if the frame time is in the range (1/60)s +/- 500us (presumably the authors didn't know that it was actually supposed to be closer to 59.92 than 60). However, it puts the timer chip in the wrong mode, causing it to count down twice as fast. So in fact it is instead determining if the frame time is either in the range (1/60)s-500us to (1/60)s or in a similar 500us range at around 1/120s. The "normal" value lies right on the edge of the range it's actually measuring. Of course, because on real hardware the rate is 22us less than (1/60)s (pretty reliably so, since they are based on the same clock signal) this works fine in practice, but I'm sure it's not what the authors meant!
This bug was preventing the mode from working on MESS because the MESS frame rate was set to exactly 1/60s (slightly too fast) and frame rate test was failing (but only just). The fix is for MESS to use 59.92Hz frame rate instead - by making the emulator almost imperceptibly more accurate, the effect works!
So, based on that, how many times per screen do you think you could change pals? Could you do it every other scan line? What about background color changes?
Could you create a 320x200 mode that changes pals every line, so you would have 7x16 pals to choose from? 4 colors per line?
For a stock 5150 PC or 5160 XT or exact equivalent with CGA, the theoretical minimum is 15 CPU cycles, which is how long an "inc ax; out dx,al" or "dec ax; out dx,al" takes. That gives you just over 20 palette changes per scanline (as long as you don't mind only being able to go up or down a colour on each change) or about 5300 times per screen. I've achieved this with my XT (picture). It also requires turning off all the interrupts and the DRAM refresh (which is safe with this code, since the big unrolled loop touches all the DRAM rows). And the picture will be totally different on a faster machine (even something like a V20). So that's not terribly useful.
Changing the palette on each line is possible and can even be done in a portable way by checking the display enable bit to synchronize the program with the raster beam. However, it has to be done with interrupts off because it's so timing sensitive, so you have to do your keyboard access (along with everything else) in the vertical overscan/blanking period. California Games' method of using a timer interrupt to do the palette changes can't change the palette often enough, but does have the advantage of being transparent to the rest of the program (aside from a slight slowdown).
Ok, but if the only goal was to display a static image then the limitations you are describing are not a problem.
What if you wanted to be able to get to any color and palette? Could you do it 10 times per scan line? 32 pixels per palette, but what if you wanted to change the color as well? I am wondering if you could get 4 colors per scan line from every palette. Also, if you are timing it correctly and change the background, would that be 4 unique colors per scan line? What I am getting at is you could optimize a picture to the best combination with whatever the limitations were and have a 320x200x16 color picture on CGA - albeit with some weird colors, but error diffusion would help there.
I know I could write a great converter to optimize, the code to display, not so much...
I'd need to run some tests to be sure (which I won't be able to do for a month or so as my XT is in transit) but I think you should be able to change the port 0x3d9 value in about 19 CPU cycles or 28.5 pixels on a 5150/5160. If it does turn out to be an odd number of CPU cycles then every other change is going to result in a pixel whose left half is one colour and whose right half is another (assuming it's not the same colour in both palettes). The body of the code would just be a long string of "mov ax,XX; out dx,al". So in 320x200 mode that would give you access to all 16 background colours and 4 of the 6 palettes (dark red/dark green/brown, light red/light green/yellow, dark magenta/dark cyan/grey and light magenta/light cyan/white).
Changing the palette every 32 pixels exactly would probably be more difficult, as that's not a whole number of CPU cycles, so you'd need a 22 CPU cycle change and then two 21 CPU cycle changes. It might be possible by adding some harmless instructions like SALC (which takes 3 cycles) and CLC (which takes 2), since this code seems to be limited by the execution unit rather than the bus. I expect that 28.5 pixels would be better than 32 anyway though.
If you wanted to use the red/cyan/white palettes as well (or 640x200 mode) things become more complicated because the +B/W and +1BPP bits are in a different register (port 0x3d8 instead of 0x3d9). So if you wanted to change one of those you'd have to do something like "mov ax,XX; dec dx; out dx,al; inc dx" which would probably take about 27 CPU cycles or 40.5 pixels, and you can't change the palette register as well in that stretch. You could change both registers at once each time with "mov ax,XXXX; out dx,ax" but that's going to take something like 28 CPU cycles or 42 pixels, so takes a bite out of your horizontal resolution. Also the port 0x3d8 and port 0x3d9 registers won't change at the same time (the change to port 0x3d9 would happen probably about 6 pixels to the right of the change to port 0x3d8). So you can simplify things a lot by sticking to the 4 palette system.
I'd love to help you write a static image displayer for this mode. Do you have a machine that you can run it on? If so I'll send you a program to time exactly how long the palette change sequence takes.
It'll be interesting to see if better results can be obtained with this program than with 40-column text mode with 1-pixel high rows, which I've been playing with - here's an example of the kind of results that should be possible with that technique on an RGB monitor.
I have two old Tandy's, but not sure they are appropriate. I'll just buy an old IBM 8086 if I have to. I have two CGA monitors that work, so I am most of the way there. If you are running 40x25x16 then you are runnign 320x200x16 and you have a wide range of dot patterns to use. I thought you could only run 2 pixels high? 80x25 at 2 pixels high gives the 160x100 mode right?
Somebody sent me a converter that uses the 2 pixel high mode to create 640x200x16, again limited by the first 2 rows of dots, if you were only use the first row of dots then I believe you'd run out of memory.
Your 16 color mode looks amazing, not sure that could be topped. I had a few ideas too to try to create a 30 Hz flicker (used that on the Mattel Aquarius myself) to get 85 colors in 160x100 mode. If you could do that with the 320x200x16-limited mode you'd have graphics that CGA never dreamed of. Obviously flicker isn't the best solution, but it is one we have at our disposal.
I had the same thought on a flicker mode for 320x200x4 - if you don't repeat any colors you'd get a 16 color mode from a wide variety of potential palettes. Now if you combined flicker with your 320x200 mode timings then we may really have something.
In my fantasy visions you would flicker between a 320x200x4 mode and a 40x25 text mode and optimize the two together to create new images. 30 Hz flicker isn't so bad.
If you go ahead and send me the program I can try it on my Tandy XT - unless the CGA mode isn't compatible, otherwise I'll go ahead and get a new rig. I've been wanting to scratch the CGA itch for a long time coming, and gives me the perfect excuse to learn some x86 assembler rather than the Z80 I am familiar with. Any advice on a good development environment? Notepad and TASM? NASM? MASM? What's your set up for XT coding?
Whoops - I replied in the wrong place, so you probably didn't get notified.
The problem with the Tandy 1000 series is that they don't have DMA I think, so you can't turn off the DRAM refresh (without disabling video) and get exactly predictable timings. I'm not sure, but they might also have some extra wait states when accessing the parts of the memory shared with video. The latter problem is solved when running from expansion board memory but not the former. So I think you'll want an IBM 5150 or 5160 (a 5155 might also work) - something with an Intel 8088 running at 4.77MHz but not a PCjr or Tandy. Here's the timing program if you want to try it anyway but I haven't been able to test it - it doesn't work with DOSBox, I think because it messes with interrupts and DMA in such a low level way. I'm afraid it's not very spectactular - it just outputs some timing numbers.
In 80x25 text mode you can't go lower than 2 scanlines per character row because of memory limitations, but in 40x25 text mode that doesn't apply - there is another limitation that kicks in - the limit on the number of character rows per screen that the CRTC can do (128, plus a maximum of 31 extra scanlines). So you have to have two or more CRTC screens per CRT frame, which is a bit fiddly but possible.
Flicker is an interesting idea and one that I haven't played with. Again, memory space is a problem though. In terms of number of colours, you can do even better (probably) by using a composite monitor instead of an RGB one - then every 4 pixel repeating pattern (2 pixels in 320x200 modes) gives a different colour, so you can get results like this or (with reduced screen area in 80-column text mode) this. They don't look quite as good on real hardware but I think they should improve once I improve the accuracy of my CGA simulation.
It might be interesting to see what a 40-column text mode with 2 scanlines per row or reduced screen area looks like with flicker though. Or as you suggested flickering between two different modes to get both a large number of colours and fine detail.
At the moment I'm using YASM to assemble 8088 assembly code, but eventually I want to write my own assembler which keeps track of the timings for me. All my code is on Github if you want to take a look - I'm afraid it's not very well organized for public consumption though. My setup involves using an Arduino plugged into my XT's keyboard port and a spliced in reset line, along with a program that makes the Arduino pretend to be the IBM manufacturing test device that can be used to load code onto the machine at boot time even with no other ports or disks plugged in. That way I can control the whole thing remotely from a modern PC, and because this code runs before the memory test, it makes for a very quick turnaround time. If you don't mind a bit of hardware hacking, I can definitely recommend this method.
Wow - you have a nice set up. I can set up my XT right next to my PC, the memory load option is really nice, but I may just go ahead and get a machine with DOS installed and network it. You can't get the same control through DOS .COM files? Hardware hacking sounds nice, but that is a whole project in and of itself. The composite mode wouldn't work on an RGB, but it is a good idea. This is fun talking about this... Now, I have two other guys I have reached out to who are fans of CGA - we should set up a place to talk about this stuff. Where to? Vogons?
The main advantage of my system over DOS is that I can (remotely) do a hardware reset when I inevitably crash the system, and recover very quickly to a known-good state (also it was necessary to get the machine working in the first place!)
Vogons is good. The VCF might be even better as I think Vogons has more of a 386/486/VGA/games/emulation focus (though I have had some good CGA discussions there too).