On the original IBM PC 5150 (and it's mostly electrically-equivalent derivatives, the 5155 and 5160) the operation of the bus (the data channel between the CPU and it's memory and peripherals) is interrupted is interrupted 2,187,500 times every 33 seconds (a rate of about 66KHz) for 11/13,125,000 of a second each time (i.e. 4 out of every 72 CPU cycles). During that time, very little of the machine can operate - no RAM can be read or written and no peripherals can be accessed (the CPU might be able to continue doing it's thing if it's in the middle of a long calculation, and the peripherals will continue to operate - it's just that nothing can communicate with each other).
Why does this happen? Well, most computers (including the one this blog post is about) use DRAM (Dynamic RAM) chips for their main memory, as it's fast and much cheaper than the slightly faster SRAM (Static RAM) chips. That's because each DRAM bit consists of a single capacitor and transistor as opposed to the 4 or more transistors that make up a bit of SRAM. That capacitor saves a lot of hardware but it has a big disadvantage - it discharges with time. So DRAM cells have to be "refreshed" periodically (every 2ms for the 16 kbit 4116 DRAMs in the original 5150) to maintain their contents. Reading a bit of DRAM involves recharging the capacitor if it's discharged, which refreshes it.
But a computer system won't generally read every bit of RAM in any given interval. In fact, if it's sitting in a tight idle loop it might very well not access most of the memory for many minutes at a time. But we would be justified in complaining if our computers forgot everything in their memories whenever they were left idle! So all general-purpose computers using DRAM will have some kind of circuitry for automatically accessing each memory location periodically to refresh it. In the 5150, this is done by having channel 1 of the 8253 Programmable Interval Timer (PIT) hooked up to the Direct Memory Access (DMA) controller's channel 0. The BIOS ROM programs the PIT for the 66KHz frequency mentioned above, and programs the DMA controller to read a byte each time it's triggered on channel 0. The address it reads counts up from 0 to 65,535 for each access, then goes back down to 0 again.
If the DRAM needs to be refreshed every 2ms why does the refresh circuit run at 66KHz and not 500Hz, or for that matter 8.192MHz? To answer that question, we need to know something about how the memory is organized. The original 5150 had banks of 8 chips (plus a 9th for parity checking). Each chip is 16 kbit, so a bank is 16KBytes. If you had a full 640KB of RAM organized this way, that would be 40 banks or 360 separate chips! (By the time that much memory become common, we were mostly using 64 kbit chips though.) Within each chip, the 16 kbits are organized in a grid of 128 "rows" and 128 "columns". To read a bit, you input the "row" address, then the "column" address, then read back the result (hence the chips could have just 16 pins each, as each address pin corresponds to both a "row" bit and a "column" bit). Happily, whenever a row is accessed, all the DRAM cells on that row are refreshed no matter what column address is ultimately accessed. Also, the low 7 bits of the physical byte address correspond to rows and the next 7 bits correspond to columns (the remaining 6 bits correspond to the bank address). So actually you could quite happily get away with just refreshing addresses 0-127 instead of addresses 0-65,535 on this machine (though there was a good reason for doing so, as we'll see later).
To ensure that they meet tolerances, electronic components (including DRAM chips) are manufactured with certain margins of error, which means that often one could get away with reprogramming the PIT to reduce the DRAM refresh rate a bit in order to squeeze a little bit more performance out of these old machines - it was a common hack to try, and I remember trying it on the family computer (an Amstrad PC1512) after reading a little bit about DRAM refresh in a computer magazine. I seem to recall that I got it up from the standard 1/18 to maybe 1/19 or 1/20 before it became unstable, but the performance improvement was too small to notice, so the little .COM file I made with DEBUG never made it as far as my AUTOEXEC.BAT.
For many of the timing experiments and tight loops I've been playing with on my XT, I've been disabling DRAM refresh altogether. This squeezes out a bit more performance which is nice but more importantly it makes the timings much more consistent (which is essential for anything involving lockstep). However, whenever I've told people about this the reaction is "doesn't that make the machine crash?" The answer is "no, it doesn't - if you're careful". If you turn off the refresh circuitry altogether you have to be sure that the program you're running accesses each DRAM row itself, which happens automatically if you're scanning through consecutive areas of RAM at a rate of more than 66KB/s, or for that matter if you've done enough loop unrolling that your inner loop covers more than 127 consecutive bytes of code. Since these old machines don't have caches as such, unrolled loops are almost always faster than rolled up ones anyway, so that's not such a great hardship.
Not all of the machines I'm tinkering with use 4116 DRAM chips. Later (64KB-256KB) 5150 motherboards and XTs use 4164 (64 kbit) chips, and modified machines (and possibly also clones) use 41256 (256 kbit) chips. The principles are exactly the same except these denser chips are arranged as 256x256 and 512x512 bits respectively, which means that there are 8 or 9 row bits, which means that instead of accessing 128 consecutive bytes every 2ms you have to access 256 consecutive bytes every 4ms or 512 consecutive bytes every 8ms respectively (the PIT and DMA settings were kept the same for maximum compatibility - fortunately the higher density DRAMs decay more slowly so this is possible). So when disabling DRAM refresh, one should be sure to access 512 consecutive bytes every 8ms since this will work for all 3 DRAM types.
The cycle-exact emulator I'm writing will be able to keep track of how long it's been since each DRAM row has been refreshed and will emit a warning if a row is unrefreshed for too long and decays. That will catch DRAM refresh problems that are missed due to the margins of error in real hardware, and also problems affecting only 41256 chips (my machine uses 4164s).
Modern PCs still use DRAM, and still have refresh cycles, though the overhead has gone down by an order of magnitude and the exact mechanisms have changed a few times over the years.
Messing with things on such a low level may admittedly be a little bit over my head - an informative read, nonetheless. Since your goal is a cycle-exact emulator, I wonder how computationally costly it would be to implement such a degree of precision. Accuracy-minded emulators are already pushing the limits of consumer-level processing power; bsnes is kind of notorious for requiring a 3 GHz CPU, and going down the same road with 8088-based PC emulation could conceivably end up in the same ballpark, and not very far behind.... curious to know if you've already crunched some numbers on this question.
It's not so much a matter of crunching numbers, as making it correct first and then optimizing it until it's as fast as I want it to be. I have a lot of tricks up my sleeve z At the moment my CPU emulator is really slow (in the ballpark of just thousands of instructions per second) but that's mostly because I'm outputting tons of debugging information for each cycle.
Once that's removed, I think the speed should be a bit more reasonable, since the emulator doesn't actually have to do that much more than (say) MESS's 8086 emulation - it's mostly just a matter of keeping more careful track of how many cycles we've burned through. So I think getting within a factor of two of MESS should be achievable, even when emulating crazy things like mid-scanline raster effects, CGA logic delays, and CRT effects.
If it's still too slow using "normal" techniquies, there are some other things I'm planning to try, such as writing my own compiler to compile it with, which can intersperse things to make it faster. Hopefully that won't be necessary for the basic emulator, but it probably will be for the modular emulator I eventually want to write.
Interesting ... does this mean that machines with shared "main" and "graphics" RAM effectively don't need refreshing? Unless you do some strange things with the addressing and manage to break it that way? Seeing as this is only necessary for the PC owing to all of its graphics hardware being on add-in boards and using their own memory.
EG something like an Amiga will spin through 320 bytes per line in its densest screen modes. It only scans at 15.7kHz, but presumably that's enough as you'll have got through all of a 64kbit and most of a 256kbit IC in that time, certainly visiting more than 66,000 addresses per second.
(Even it's lowest rez mode would get through the equivalent of about 600kbytes/sec)
(The question comes whether this is enough for something like a Spectrum or C64, as they will both have lower density chips, and spin through a lot less data per line... EG a Spectrum has about 7kbyte of graphics memory... refreshing 50 times per second... nope, we're still good, as that's 350kbyte/sec equivalent, far faster than the PC would do it (and of course, incurring a CPU-to-RAM speed hit as a result), and easily enough for the 16kbit DRAMs the early models used)
All DRAM needs refreshing, but yes - many of the 80s microcomputers were designed to have their DRAM refreshed by their video address generators, including the Apple II, the BBC Micro and the PC's CGA card. However, all three of the systems you mentioned have dedicated DRAM refresh circuits. In the Spectrum, they're integrated into the CPU. In the C64 and Amiga they part of the Vic-II and Agnes repectively (though in each case the refresh addresses are not the same as the video access addresses).
Integrating DRAM refresh with video does have some downsides - it does limit your video access patterns putting a constraint on your video memory layout and making it more complicated to do certain animation techniques. Probably the former was more of a concern to designers of that era - along with the fact that they had a perfectly good non-video way of doing the refresh cycles.
An interesting bit of trivia: The Commodore VIC-20 did not have refresh built into the chipset. Commodore opted for static ram, so they did not have to handle the refresh problem. The machine only had 5 KB of RAM anyway, so the extra cost was not that much of an issue.
It's the only machine from that era that I know of, that uses static RAM, and no refresh.
That is interesting - thanks, Scali! I think the RAM in the Atari 2600 was static too, though that's a few years earlier.
Yes, I think you're right.
The 2600 didn't even have any RAM chip as such, but the 128 bytes(!) of memory in the system were integrated in the PIA 6532 chip, which handled I/O and timers.
Interesting ... I did figure mainly that as you would, at least for the bitplane based machines, have to do sequential-read at least on a line by line basis (even if you changed the video page/etc between lines), it would be a fairly effective and "free" way of effecting the whole-RAM refresh... with only programmers dealing with quite complex or advanced display-mashing techniques having to pay it particular attention and overriding the default mechanism (and implement their own, with the performance penalty being worth it for the flexibility). Anyone using regular screen access, or even simpler techniques like splitscreen etc wouldn't have to worry, as each column (row? I forget) in the matrix would be touched regularly enough to avoid fading out so long as a single text row's worth of bitmap was shunted from RAM to screen with reasonable regularity. (And a separate refresh routine maybe taking place in the Vblank area, but not being as much of an issue as it wouldn't directly interfere with display drawing by its very nature anyhow).
Tile-based displays may be more problematic though, as you can't predict the access pattern except for the relatively much smaller and lower frequency tile array block...
As for the CGA, I was considering that as being entirely separate to the main computer anyhow, as it doesn't use system RAM; hence the computer needs its own refresh anyhow. The CGA card doesn't do much other than taking stuff off the bus, putting it into the onboard VRAM, then repeatedly and sequentially reading it out into the monitor port (and/or whatever passes for a DAC on the composite video line...).
The Amiga I guess needs separate refresh, now I think about it, because of the "fast RAM". Perhaps it doesn't have to bother when it comes to "chip RAM", as that gets buzzed through by the vidchip, but the fast RAM is purely for the CPU's use, much like a PC's system memory... The C64 I'm not familiar enough with, but I coulda sworn the background is bitmap and a large proportion of the bus time is given over to sequentially reading the video field data from RAM to VIC-II even though it has sprites as well? Is the separate refresh just for Vblank, or does it also run during the display time for some reason?
I think tile-based displays aren't really a problem because refresh only needs to touch each row, not each bit. So as long as the "screen position to tile" map covers all the rows, the "tile to pixels" map data will get refreshed as well.
The CGA isn't responsible for refreshing the system RAM, but as you said it does have its own VRAM which is also DRAM and does need to be refreshed. That RAM doesn't partake in the system refresh and the CGA card doesn't have any dedicated refresh circuitry so it must be refreshed by the display addresses. Fortunately the 6845 continues to generate addresses even in the non-display areas, so works fine for this task.
The C64 can't use the display address for refresh because the data would decay during vertical blanking as you said. However, it's simpler to refresh constantly (even in the active display area) than to add circuitry to turn off the refresh during the active time. It doesn't slow the system down either, since the Vic-II does a memory read every clock cycle anyway.
[…] you want to read more about DRAM refreshes on PC, and how to get around it, there is some more in-depth information on Andrew Jenner’s blog. This idea is similar to the ‘stable raster’ I described for C64 earlier, but on PC it […]