Volatile registers in GCC

June 14th, 2010

When writing microcontroller code in assembler, it's nice to be able to keep some commonly used variables in registers. The 8-bit AVR devices in particular have a nice big set of registers, most of which are rarely used in compiled code. Usually it's a waste of time to write all the code in assembler, though - it's much better to write the non time-critical bits in C, compile them with GCC and then link with the assembly code bits.

When accessing the register variables from the C code, the natural thing to do is just to make a global register variable:

register uint8_t frame __asm__ ("r3");

That has two effects - it allows you do access the variable as if it were a normal variable:

void initFrame()
{
    frame = 0;
}

And it prevents the compiler from using r3 for something else (though one also has to be careful that the register isn't used by any other linked in libraries, including the C library and the compiler support functions in libgcc).

The trouble comes when you try to read those register variables. If optimizations are turned on, then the following code might just be an infinite loop:

void waitForFrame(int frameToWaitFor)
{
    while (frame != frametoWaitFor);
}

The compiler hoists the read of frame outside the loop, and never sees the updates. If frame was a normal variable we could fix this just by adding volatile, but using volatile with a register variable doesn't work. This seems odd until we think about what volatile actually means. A read from or write to a volatile variable is considered an observable effect (like doing IO) so the compiler won't optimize it away. But the compiler has no concept of a "read from" or "write to" a register - registers are just used or not used and the optimizations around them are unaffected by the notion of volatile.

There is a reasonably easy and not-too-invasive way to fix this, though, through the use of inline assembly. If you write:

#define getFrame() ({ \
    __asm__ volatile ("" : "=r"(frame)); \
    frame; \
})
 
void waitForFrame(int frameToWaitFor)
{
    while (getFrame() != frametoWaitFor);
}

The compiler will treat the "" as a block of assembly code which writes to r3 and which has unknown side effects (so that it can't be hoisted out of a loop for example). The code doesn't actually do anything (so the generated code won't be adversely affected) but it essentially provides a read barrier to the register. Unfortunately you can't use getFrame() to write back to frame, so to increment it for example you have to do frame = getFrame() + 1; but that's actually kind of helpful because it makes the possibility of a race condition (for example by an interrupt routine also incrementing frame at the same time) more obvious.

Improved GPU usage for fractal plotter

November 1st, 2009

I've been tinkering with my fractal plotter again. One thing that annoyed me about it was the pauses when you zoomed in or out past a power of 2. I thought this was due to matrix operations until I did some profiling and discovered that it was actually dilation (both doubling and halving) of the "tiles" of graphical data to which squares are plotted and which themselves are painted to the screen.

This is work that can quite easily be done on the GPU, without even having to resort to pixel shaders, by using the ability to render to a texture. Here is the result and here is the source. In order to do this I moved all the tiles to video memory (default pool instead of managed pool) and used ColorFill() to actually plot blocks instead of locking and writing directly to textures. All this adds up to much more CPU time available for fractal iterations.

Another change is that instead of an array of grids of tiles, I've switched to using a grid of "towers" each of which is itself a tile and can point to 4 other towers. This simplifies the code somewhat.

There is still some glitchiness when zooming but it is much less noticable now.

This reminds me of something I meant to write about here. When I originally converted my fractal program to use Direct3D, I figured that locking and unlocking textures was probably an expensive operation so rather than locking and unlocking every time I needed to plot a square, I kept them all locked most of the time and just unlocked them to paint. However, it turns out that this "optimization" was actually a terrible pessimization - now all the tiles were dirtied each frame and had to be copied from system memory to video memory for each paint, and because of the locking nothing else could happen during that time. I was able to get a big speed up by locking and unlocking around each plot operation - that caused only the parts of tiles that were actually plotted on to be dirtied. It just goes to show that when optimizing you do have to be careful to actually measure performance and see where the slow bits really are.

Wireless mice

October 31st, 2009

This is fascinating. I wonder how long it will be before people are implanting tiny devices into mouse brains that receive commands from the internet via the cellular networks and transmit video and audio back, so that the mice can be driven around by remote control and used to spy on people and things.

Fall foliage

October 30th, 2009

Inspired by this XKCD comic:

I made this:

And this, going in the opposite direction:

Rotating fractal

October 29th, 2009

Last year, I wrote about a way to make rotating fractals. I implemented this and here is the result:

The equation used was z <- z1.8 + c, and the branch cut varies from 0 to ~10π.

What are all those pins for?

October 28th, 2009

I recently built myself a new computer using an Intel Core i7 920 CPU. This CPU has more pins (well, "lands" actually, since they are just flat conducting areas that touch pins in the socket) than any other yet produced, 1366 of them to be precise. I was wondering why so many were needed, so I grabbed the datasheet and made a map:

Power:
     VSS
     VCC
     VCCPLL
     VTTA
     VTTD
     VDDQ

Memory:
     DDR0 data      other
     DDR1 data      other
     DDR2 data      other

Other:
     QPI data      other
     Other
     reserved

Idle speculation follows (I don't have any background in CPU or motherboard design):

The pins roughly divide into six sections: two for memory data, one for other memory-related signals, one for power, one for the QPI bus and one that is mostly reserved.

That there are a lot of power pins is not surprising - this CPU can use as much as 145A of current, which is enough to vaporize any one of those tiny connections, so it has to be spread out amongst ~300 of them for each of power and ground. Having two very big pins for power would probably make the mechanical engineering of the CPU much more difficult and would push the responsibility for branching out that power onto the CPU, whereas it is better done by the motherboard.

It's interesting that the ground lands are mostly spread out but the power lands are mostly together. I'm not sure why that should be - I would expect them both to be spread out. Perhaps the 8 or 9 big groups of VCC on the north edge each correspond to a single "power line" on the motherboard (and hence are grouped together) while the distributed ground lands are needed to supply electrons for the signal lands.

Three DDR3 channels also use a lot of lands - 192 for data alone and almost as many again for addresses, strobes and clocks.

Another thing that surprised me is that there are so many reserved lands (~250 of them). Initially I thought that this was because the socket was designed before the designers knew how many pins they would actually need, so they made sure to design for the absolute maximum. However, a good chunk of the reserved lands are used by the Xeon 5500 CPUs, which use the same socket - in particular for memory error detection/correction and the second QPI bus (which is presumably in the northwest corner).

Edit 14th July 2013:

Here are some more nice pin maps.

A trip around the cardioid

October 27th, 2009

Take a point that moves around the edge of the main cardioid of the Mandelbrot set, and plot orbits with values of c close to that point. You end up with this:

It can be thought of as a sequence of cross-sections of the Buddhabrot.

Guns would not be useful against a tyrannical government

October 26th, 2009

Gun enthusiasts in the US often claim that it's important that citizens can bear arms in order to protect against a government that has become tyrannical. However, I don't think that argument really holds water - it seems to me to be a rather outlandish fantasy that a group of citizens could overthrow the government.

For a tyrannical government to have any effect, the power structures between it and the people would still have to be largely in place - in particular, the military and the police would have to be still taking orders from the government. But any given individual citizen gun-owner would be vastly outgunned by the military, which has access to far more powerful weapons. So an extremely large number of individual gun owners would be needed. I have no idea how many, but it would probably have to be several times the size of the US standing army, so in the multiple millions. But if the government failed to convince all those millions of people that it is not a tyranny, how could it have convinced the military and the police?

A far more useful tool against tyranny is an educated and well-informed population. If you can't pull the wool over the eyes of the people, you also can't pull the wool over the eyes of the agencies enforcing the will of the government. For this reason it's far more important that people get accurate and unbiased news than it is that guns are kept legal. If a tyrannical government does emerge (and there are some indications that it already has) it will be because the people have been lied to, not because they don't have enough guns. And frankly, the state of most mainstream news is so bad that this does seem to be a real danger.

It's very important that we all have a good understanding of current affairs. To do this we should:

  • Avoid getting our news from just one source, or from sources with similar bias.
  • Check the facts - follow up on the references and follow the chains of evidence back to the source wherever possible.
  • Know our fallacies
  • Disregard news sources that rely on unsubstantiated rumour ("Some say that...")
  • Be particularly wary of religious arguments, since in religion not only is objective evidence lacking, but searching for it is actively discouraged.

People are colonies

October 25th, 2009

When I first learned that the human body was made up of trillions of cells I was fascinated. These cells are almost like small organisms themselves - they grow, reproduce, consume and respond just as the organism itself does. It's almost as if the human body is a colony, not just an individual. In fact, it seems very likely that the first multicellular organisms were actually colonies of individuals which stuck together and began to evolve as a group, not just as individuals.

Another fascinating fact that I learned recently is that there are more bacterial cells than human cells in a human body - though they are much smaller they are about 10 times more numerous. It's sort of like how we keep animals of different species like cows and chickens in our macroscopic communities.

Even our human cells aren't "pure human" - they contain mitochondria which have their own DNA and almost certainly evolved from a separate line if you go back far enough in history. It's almost like life is fractal (though the self-similarity doesn't descend infinitely).

That makes me wonder if colonies act as individuals on a much larger scale. If we colonise the universe could we end up with societies that are complex enough to have an awareness of their own? Could we ever, as individuals, become aware of this awareness? Presumably (because of the speed of light) such awareness would be much slower than ours and generations could be born and die in the time it takes for a single thought to happen in on the "higher level". However, because we (unlike our cells) are intelligent beings, we could presumably read the writings that such a being had made over the course of history. Such a being would be a God, in a sense, as it would transcend us, but wouldn't necessarily be omnipotent, omniscient or kind, and certainly wouldn't have created the universe.

Top posting

October 24th, 2009

Before I started working at Microsoft, I used to always reply to emails by quoting them, breaking up the quoted text into pieces and then replying to each of the pieces directly below, for example:

From: andrew@reenigne.org
To: xyz@example.com
Subject: Re: Hello

Xyz <xyz@example.com> wrote:
> Hello,

Hello!

> how are you today

I'm fine, thank you.

This style is called inline replying with trimming. This is a fine system because the person I'm replying to gets reminded of what they wrote, and I don't have to write things like "In regards to the part of your email where you asked me how I was today,".

The most common other system is top posting, which looks like this:

From: andrew@reenigne.org
To: xyz@example.com
Subject: Re: Hello

Hello! I'm fine, thank you.

Xyz <xyz@example.com> wrote:
> Hello, how are you today

This is the natural default with Microsoft Outlook. In the geek circles I had moved in before working at Microsoft, this style was greatly frowned upon. However, it is ubiquitous at Microsoft. I'm not sure whether this is because it's the default style in Outlook or whether it's the default style in Outlook because it is ubiquitous at Microsoft. However, once I had forced myself to "do as the Romans do" and top post, I found that it does actually make more sense in that environment. This is for two reasons:

  1. When the conversation falters due to lack of knowledge about something, it's very common to "loop in" an expert to give their two cents by adding them to the CC line. In order for the expert to have some context, it's useful to have the previous history of the conversation right there in the email, so he or she can read it (bottom to top).
  2. With each email carrying the entire thread, emails can get pretty long. It's inconvenient to have to scroll all the way to the bottom of each email to see the latest reply (especially if you're just a spectator rather than a contributor to a busy thread) so it's better for the replies to be at the top than at the bottom.

It's still useful to reply inline as well sometimes - at Microsoft this is done by quoting the email you're replying two twice - once, in its entirety, at the bottom and once (suitably chopped and trimmed) inline. I used to do this quite frequently as it's the best way I've found (pre-Argulator) of addressing each point individually. However, one of my managers once told me that if the conversation got sufficiently complex that I felt it was best to do that, I should instead "take it offline" and schedule a face-to-face meeting instead to hash out these issues. However, I felt (and still feel) that inline email replies are better than face-to-face meetings for such complicated issues - in face to face meetings there's less time to think about your answer, and points can get lost - as the conversation progresses it can only follow one "branch" of the argument tree, and without explicitly maintaining a stack it's very easy for branches to get forgotten about.