Some years ago, I was tinkering about with some sound synthesis code in Windows. This was in the pre-Vista days, and the officially recommended way of doing sound output was using DirectSound. My application was interactive, so I wanted to make it as low latency as possible without having too much overhead. I set it up with a buffer size of 40ms (3,528 bytes). I then set up IDirectSoundNotify with two pointers, one at the middle of the buffer and one at the end. Whenever one was triggered I would fill the other half of the buffer. This all worked great.
At least, it did until I came back to the code some years later, after having upgraded the OS on my laptop to Windows Vista (the hardware hadn't changed). Suddenly this code which worked great before sounded horrible - the sound was all choppy as if only half of the buffers were being played. What happened?
After some experimentation, I discovered that the jittering happened with a buffer of less than 7,056 bytes (80ms) but not with larger buffers. Armed with this evidence and a bit of pondering, I have a good theory about what happened.
The Windows audio system was drastically rewritten in Windows Vista and DirectSound was re-implemented - instead of a thin layer over the driver, it became a high level API implemented on top of the Windows Audio Session API (WASAPI). In doing so, it lost some performance (it's no long hardware accelerated) and, it seems, suffered an increase in latency - the DirectSound implementation uses a 40ms buffer (I think). That's all very well, but there's a bug in the implementation - IDirectSoundNotify fails to trigger if the positions are more closely spaced than this.
The preferred API for this kind of thing is now XAudio2 which is actually a little bit nicer (the code to do the same thing is slightly shorter) and works on both XP and Vista (unlike WASAPI). I can't really fault Microsoft too much since apparently this particular use of IDirectSoundNotify is rather unusual (or they would have made it work) but still it's annoying that DirectSound went from being the recommended API to buggy and (practically if not technically) deprecated in a single Windows version. Still, I understand that the world of Linux audio is even worse (albeit getting slowly better).
I wonder why audio APIs seem to go through so much churn relative to graphics (given that audio is that much less complicated, and that the hardware isn't really changing much any more). I sometimes wish all the audio APIs were as simple as a "get the next sample" callback API but I guess this is too CPU intensive, or at least was when the APIs were designed.
[...] It takes some 1000-1500 cycles to switch between user mode and kernel mode. So you can’t do a syscall for each sample in a realtime-generated waveform, for example. There are many other ways that an operating system could be simplified if not for [...]