After I posted 8086 microcode disassembled, Ken Shirriff sent me a high-resolution image of the microcode ROM from the 80386. I didn't expect I would ever do anything with it for a couple of reasons: one is that it's absolutely huge (94720 bits) compared to the 8086 one (10752 bits) so (even with bitract or similar) it would be extremely tedious to transcode and check. The other reason is that I wouldn't know where to start with it - at least with the 8086 there was a patent which gave the general outline and some chunks of code which I could search for. The 80386 was a complete black box. I knew what it did and had a rough idea of how it might work but that turning that into something that I could search for in a big blob of binary seemed like an insurmountable challenge.
Some years later, I was talking to GloriousCow and Smartest Blob (possibly amongst others) on Discord and they mentioned that it would be interesting to get high resolution images of the 80386 die and try to extract the microcode from it. I mentioned that the first part had already been done but that turning the image into a binary blob and a binary blob into intelligible microcode seemed too hard. Well, they may have taken that as a bit of a challenge - they threw various bits of image processing, neural networks, and human-aided automation at the problem and a few days later had the binary blob extracted from the image and cross-checked.
Disassembling it was still quite a challenge, though! We found various patterns and gradually figured out how to rearrange it into μ-ops on one axis and μ-op bits on the other. Then on the order in which to read the μ-ops (helped by a block of unused μ-ops at one end). And how to divide up the μ-op bits into fields. From the 8086 microcode work I assumed that two of the fields would be source and destination registers to copy from. I also knew that the 80386 could do an ALU operation in 2 cycles, suggesting that there had to be a field to specify a second input to the ALU in order that the microcode for these operations could load both operands to the ALU in the first cycle and then the output to the destination on the second cycle. There was also a pattern that occurred with some regularity that we suspected might indicate the end of an instruction (we were right).
Ken helped too by tracing various lines and bits of logic on the 80386 die so that we could see how things were connected up. Gradually the picture become clearer. Each time we figured something out it gave a clue as to the meaning of other chunks of microcode that used the same construct. At the same time we were working on decoding the instruction decoder (which consists of multiple smaller PLAs) and the protection test PLA. Eventually we got to the point where we could associate 386 instructions with chunks of microcode, and things became much clearer.
The 80386 is much faster on a per-cycle basis than the 8086 for most instructions, a feat which it achieves by throwing a lot more transistors at the problem - many algorithms which are implemented by microcode in the 8086 are essentially "hardware accelerated" in the 80386 so I realised early on that more of the 80386 microcode would be setting up these accelerators instead of embodying algorithms directly. Figuring out the interfaces between the accelerators (like the multiply and divide hardware, the barrel shifter, and the protection test unit) and the microcode was a lot of the work.
How many different instructions does the 80386 have, according to the microcode? What are they?
The microcode has 215 entry points from the decoding ROM - quite an increase over the 60 of the 8086! Part of this is new instructions, and part is that instructions are handled by different routines depending on such things as whether their operands are registers or memory, whether the CPU is in real or protected mode, and whether REP prefixes are in operation. I won't list them all here but you can find them in the fields.txt file if you're interested (along with all the subroutines and shared code). It's not very meaningful to list the top-level microcode routine size since many of them do a small amount of work and them jump to a routine shared with another entry point. It's also not meaningful to list the number of opcodes each entry point handles, as the instruction decoder uses more than just the opcode to determine which routine to use.
Are there any instructions not handled by the microcode?
Surprisingly, no! Unlike the 8086 (and also unlike modern CPUs), the 80386 is always executing a μ-op and there is microcode for every instruction.
Does the microcode contain any "junk code" that doesn't do anything?
The routine from 0x849 to 0x856 inclusive (marked as "unused?" in the microcode disassembly) doesn't seem to have any entry points associated with it. I'm not completely sure what it does, but it has a lot in common with the routine #PF (PAGE_FAULT) routine at 0x8e9-0x8f5 - both end up doing an interrupt 0x0e with the error code set to the last error code from the paging unit. But this routine sets CR2 to some mysterious value from the paging unit instead of the fault linear address. All the other microcode seems to be designed to implement the documented behaviour of the CPU (or undocumented behaviour in the case of the routines that handle interaction with the ICE (In-Circuit Emulator) hardware used for low-level debugging.
Does the microcode have any hidden features, opcodes or easter eggs that have not yet been documented?
I am not totally sure about this as I don't have a real 386 machine to try it on, but I may have found a flaw in the IO permission bitmap handling that was used by some protected-mode OSes to grant user-mode processes limited access to IO ports (a practice that might be considered horrifyingly insecure by modern standards). When a 4-byte port access occurs then it seems like the microcode only checks the permission bits for the first 3 addresses. So if such an access were to be performed at the edge of the IO-port space that the process has permission for, the final byte of the access could erroneously succeed and potentially access some hardware register that the OS did not expect make user-accessible. This is quite an obscure bug so not too surprising that it was missed without the microcode disassembly. However, it is rare for a security bug in such a ubiquitous piece of hardware to go unnoticed for more than 40 years! It is possible that it only happened in some versions of the CPU. Or that I have misunderstood how the routine works and it is actually correct after all. This microcode does not seem to be from an early version of the 80386, though - there is no sign of the XBTS/IBTS instructions except in the decoder.
How can I learn to understand the microcode disassembly?
nand2mario has written up some excellent blog posts based on the disassembly:
80386 Multiplication and Division
These are probably a good place to start, along with the various files in the git repository for the disassembly
Where can I download the disassembly?
You can find it at the x86 microcode repository on github. Start with the parts.txt file which says what all the other files are, or microcode_10.txt to jump right into the disassembly.
Credits
Thank you to Daniel Balsom (gloriouscow), Smartest Blob, nand2mario, and Ken Shirriff.
There is a very critical piece of information missing, which is that nowhere it appears to be mentioned which particular 80386 Stepping or sSpec all this analysis was based on. The 386 had a MASSIVE amount of Steppings with different bugs, erratas, and even instructions. For example, the IBTS and XBTS you mentioned were supposedly removed in Stepping B1, so your unit had to be newer than that: https://www.pcjs.org/documents/manuals/intel/80386/
I'l repeat what I said in your 8086 Microcode dissambly blog post from a few years ago: It would be very nice to see how Microcode changes between Steppings to see how Intel deal with bugs and the like back before runtime Microcode updates were a thing.
The day where it becomes possible to make a CPU emulator that can reproduce a Stepping bugs/errata by loading different Microcode versions would be glorious, indeed.