80386 microcode disassembled

After I posted 8086 microcode disassembled, Ken Shirriff sent me a high-resolution image of the microcode ROM from the 80386. I didn't expect I would ever do anything with it for a couple of reasons: one is that it's absolutely huge (94720 bits) compared to the 8086 one (10752 bits) so (even with bitract or similar) it would be extremely tedious to transcode and check. The other reason is that I wouldn't know where to start with it - at least with the 8086 there was a patent which gave the general outline and some chunks of code which I could search for. The 80386 was a complete black box. I knew what it did and had a rough idea of how it might work but that turning that into something that I could search for in a big blob of binary seemed like an insurmountable challenge.

Some years later, I was talking to GloriousCow and Smartest Blob (possibly amongst others) on Discord and they mentioned that it would be interesting to get high resolution images of the 80386 die and try to extract the microcode from it. I mentioned that the first part had already been done but that turning the image into a binary blob and a binary blob into intelligible microcode seemed too hard. Well, they may have taken that as a bit of a challenge - they threw various bits of image processing, neural networks, and human-aided automation at the problem and a few days later had the binary blob extracted from the image and cross-checked.

Disassembling it was still quite a challenge, though! We found various patterns and gradually figured out how to rearrange it into μ-ops on one axis and μ-op bits on the other. Then on the order in which to read the μ-ops (helped by a block of unused μ-ops at one end). And how to divide up the μ-op bits into fields. From the 8086 microcode work I assumed that two of the fields would be source and destination registers to copy from. I also knew that the 80386 could do an ALU operation in 2 cycles, suggesting that there had to be a field to specify a second input to the ALU in order that the microcode for these operations could load both operands to the ALU in the first cycle and then the output to the destination on the second cycle. There was also a pattern that occurred with some regularity that we suspected might indicate the end of an instruction (we were right).

Ken helped too by tracing various lines and bits of logic on the 80386 die so that we could see how things were connected up. Gradually the picture become clearer. Each time we figured something out it gave a clue as to the meaning of other chunks of microcode that used the same construct. At the same time we were working on decoding the instruction decoder (which consists of multiple smaller PLAs) and the protection test PLA. Eventually we got to the point where we could associate 386 instructions with chunks of microcode, and things became much clearer.

The 80386 is much faster on a per-cycle basis than the 8086 for most instructions, a feat which it achieves by throwing a lot more transistors at the problem - many algorithms which are implemented by microcode in the 8086 are essentially "hardware accelerated" in the 80386 so I realised early on that more of the 80386 microcode would be setting up these accelerators instead of embodying algorithms directly. Figuring out the interfaces between the accelerators (like the multiply and divide hardware, the barrel shifter, and the protection test unit) and the microcode was a lot of the work.

How many different instructions does the 80386 have, according to the microcode? What are they?

The microcode has 215 entry points from the decoding ROM - quite an increase over the 60 of the 8086! Part of this is new instructions, and part is that instructions are handled by different routines depending on such things as whether their operands are registers or memory, whether the CPU is in real or protected mode, and whether REP prefixes are in operation. I won't list them all here but you can find them in the fields.txt file if you're interested (along with all the subroutines and shared code). It's not very meaningful to list the top-level microcode routine size since many of them do a small amount of work and them jump to a routine shared with another entry point. It's also not meaningful to list the number of opcodes each entry point handles, as the instruction decoder uses more than just the opcode to determine which routine to use.

Are there any instructions not handled by the microcode?

Surprisingly, no! Unlike the 8086 (and also unlike modern CPUs), the 80386 is always executing a μ-op and there is microcode for every instruction.

Does the microcode contain any "junk code" that doesn't do anything?

The routine from 0x849 to 0x856 inclusive (marked as "unused?" in the microcode disassembly) doesn't seem to have any entry points associated with it. I'm not completely sure what it does, but it has a lot in common with the routine #PF (PAGE_FAULT) routine at 0x8e9-0x8f5 - both end up doing an interrupt 0x0e with the error code set to the last error code from the paging unit. But this routine sets CR2 to some mysterious value from the paging unit instead of the fault linear address. All the other microcode seems to be designed to implement the documented behaviour of the CPU (or undocumented behaviour in the case of the routines that handle interaction with the ICE (In-Circuit Emulator) hardware used for low-level debugging.

Does the microcode have any hidden features, opcodes or easter eggs that have not yet been documented?

I am not totally sure about this as I don't have a real 386 machine to try it on, but I may have found a flaw in the IO permission bitmap handling that was used by some protected-mode OSes to grant user-mode processes limited access to IO ports (a practice that might be considered horrifyingly insecure by modern standards). When a 4-byte port access occurs then it seems like the microcode only checks the permission bits for the first 3 addresses. So if such an access were to be performed at the edge of the IO-port space that the process has permission for, the final byte of the access could erroneously succeed and potentially access some hardware register that the OS did not expect make user-accessible. This is quite an obscure bug so not too surprising that it was missed without the microcode disassembly. However, it is rare for a security bug in such a ubiquitous piece of hardware to go unnoticed for more than 40 years! It is possible that it only happened in some versions of the CPU. Or that I have misunderstood how the routine works and it is actually correct after all. This microcode does not seem to be from an early version of the 80386, though - there is no sign of the XBTS/IBTS instructions except in the decoder.

How can I learn to understand the microcode disassembly?

nand2mario has written up some excellent blog posts based on the disassembly:

80386 Multiplication and Division

80386 Barrel shifter

80386 Protection

80386 Memory Pipeline

These are probably a good place to start, along with the various files in the git repository for the disassembly

Where can I download the disassembly?

You can find it at the x86 microcode repository on github. Start with the parts.txt file which says what all the other files are, or microcode_10.txt to jump right into the disassembly.

Credits

Thank you to Daniel Balsom (gloriouscow), Smartest Blob, nand2mario, and Ken Shirriff.

12 Responses to “80386 microcode disassembled”

  1. Zir Blazer says:

    There is a very critical piece of information missing, which is that nowhere it appears to be mentioned which particular 80386 Stepping or sSpec all this analysis was based on. The 386 had a MASSIVE amount of Steppings with different bugs, erratas, and even instructions. For example, the IBTS and XBTS you mentioned were supposedly removed in Stepping B1, so your unit had to be newer than that: https://www.pcjs.org/documents/manuals/intel/80386/

    I'l repeat what I said in your 8086 Microcode dissambly blog post from a few years ago: It would be very nice to see how Microcode changes between Steppings to see how Intel deal with bugs and the like back before runtime Microcode updates were a thing.
    The day where it becomes possible to make a CPU emulator that can reproduce a Stepping bugs/errata by loading different Microcode versions would be glorious, indeed.

    • GloriousCow says:

      The 386 CPU is either a B0 or B1 stepping. The die photos were made before anyone really realized the exact provenance of the CPU in question was going to be crucially important.

      I'd hesitate to trust Intel documentation too much, for example they said they had removed LOADALL from the 386 at some point, but I found it even on extremely late 386's such as the 386EX, so I don't think that is actually true. It's also possible that an instruction remains in microcode but is disabled via other means.

      It would indeed be nice to compare the results of various steppings. Some of our tools and workflows could be reused to more quickly identify differences since most of the hard work of field identification has already been done, but I think other people are going to need to step up to get the requisite imagery done if that is going to happen.

    • Octocontrabass says:

      I would guess it's from stepping B1, since the microcode initializes EDX to 0x303 after reset.

    • Cassidy says:

      What you're asking for is pretty much impossible after 40 years. In order to know how the microcode changed over the history of the i386 revisions (without Intel releasing that info from their archives) is to physically have a sample of all revisions. That's just not a realistic ask from hobbyists.

      • Zir Blazer says:

        Impossible? On the contrary, with all the effort that was already done, I would assume that most of the tooling and know-how they have would be reusable to analyse the Microcode from another 80386 Stepping if the project goes in that way. Hardest part seems to be the decapping process, and that early 80386 are collector's pieces so they will be rare and expensive to procure, more so for a destructive purpose. Yet MAME hobbysts were already doing stuff like this like two decades ago, and later it progressed to dumping ROMs from high resolution images:
        https://gurudumps.otenko.com/decap/
        https://caps0ff.blogspot.com/2018/03/taito-c-chip-data-by-lobotomy.html
        https://caps0ff.blogspot.com/2020/04/help-us-preserve-great-swordsman.html
        So no, is certainly NOT impossible for hobbysts to do this if they're already here. Whenever the cost and time investment is worth it for a hobby is a different matter, but I'm optimistic that the skills and tools exists.

        With the whole emulation for preservation trend, I viewed emulating a specific chip Stepping as the final frontier, given that nearly everything else seems to be already done. We are still using sort of "generic CPU emulators" based on datasheet with whatever custom handling for major bugs or erratas is needed for certain Software compatibility, but I believe than emulation will eventually evolve into something similar to what nand2mario FPGA CPUs z8086 and z386 do as they're based on original CPU Microcode gotten from this kind of effort:
        https://nand2mario.github.io/posts/2025/z8086/
        https://nand2mario.github.io/posts/2026/z386/

        I have some kind of obsession with this because there are still many unresolved mysteries involving major differences between CPU Steppings. For example, in the console emulation world, Pilotwings from Super Nintendo was known for an attract demo oddity since it behaves differently depending on the DSP chip revision than the cartridge came with, and it apparently took them a while to figure out what the difference was because the ROM dumps were identical:
        https://www.nintendolife.com/news/2019/05/random_the_captivating_mystery_of_pilotwingsr_crashing_plane

        My personal white whale is what happened during Digital Research Concurrent DOS 286 development, since supposedly it was developed in some early 80286 Stepping where 286 LOADALL behaved in a certain way, then in a later Stepping Intel changed LOADALL behavior, breaking compatibility with what they had already working and delaying release, then in 80286 E2 Stepping it supposedly went back to the original behavior (Or a third behavior that was compatible enough with the first):
        https://books.google.com.ar/books?id=_y4EAAAAMBAJ&lpg=PP1&pg=PA21&redir_esc=y#v=onepage&q&f=false
        I don't recall anyone ever getting into that one, but seems to have wrecked Digital Research time to market in a critical period of PC history. I wonder how many similar to that one exists.

        Perhaps the best documented cases about major differences between CPU Steppings involves the 80386 series, and they're known because Windows checks for those:
        https://www.pcjs.org/blog/2015/02/23/

        Check Windows/386 Loader (WIN386.EXE) section here for some Windows specific stuff related to early 80386 Steppings:
        https://virtuallyfun.com/2025/09/06/unauthorized-windows-386/

        And here with Microsoft Raymond Chen claiming than B1 Stepping didn't support Virtual Memory on first 64 KiB of memory:
        https://devblogs.microsoft.com/oldnewthing/20110112-00/?p=11773

        Section 80386 from Geoff Chappell here:
        https://web.archive.org/web/20230309005959/https://www.geoffchappell.com/studies/windows/km/cpu/identification.htm

        I'm also certain that just like the demoscene do stuff like the 8088 MPH demo which only worked on original IBM PC 5150 with an 8088 or very accurate emulators (Which were apparently nonexistent when it was released), that there are non zero chances that we eventually get such demoscene trolls to flex their muscles by doing demos targetting odd behavior in specific CPU Steppings. I can see that one coming from a mile away.

  2. coderofsalvation says:

    what are the implications of this?
    Does it potentially open doors to x86 as a truely open perma-computing foundation?

    • AzureDiamond says:

      x86 has been fully documented (outside of illegal opcodes) from the start and everything until amd64 at least is patent free. its already possible to create a full open implementation but its way easier to use riscv and emulate x64/x86 in software maybe with some apple silicon style hw acceleration. this project is important work for preservation and learning how cpu tech developed over time, maybe coming up with new retro computing tricks but i dont think its relevant for new hardware.

  3. MrMadguy64 says:

    Would be nice to see reverse-engineering of EGA and VGA. The biggest questions: why skews were needed and why 64Kb modes used different settings.

  4. Lalufu says:

    There have been plenty of bugs and misunderstandings around the whole IO permission bitmap thing before, too
    https://www.os2museum.com/wp/the-history-of-a-security-hole/

  5. MrMadguy64 says:

    Don't get me wrong - I don't want to be invasive. I just want to say, that, I guess, it wouldn't be enough to use oscilloscope and trace all signals to truly reverse-engineer EGA/VGA. Only way to do it - to make microscope photos of their crystals and reverse-engineer their logic.

  6. Octocontrabass says:

    How many of the (known) errata in B1-stepping CPUs are caused by bugs in this microcode? Is that even possible to determine without seeing the microcode from other steppings?

Leave a Reply