For my TV emulation, I wanted to render scanlines nicely and at any resolution. xanalogtv does vertical rescaling by duplicating rows of pixels, which unfortunately makes some scanlines appear wider than others. Blargg's NTSC filters don't do any vertical rescaling at all.
The first thing I tried was a sinc interpolation filter with the kernel scaled such that the scanline only covered 70% of the pixels vertically (essentially modelling the scanlines as long thin rectangles). This worked great except that it was far too slow because of the sinc function's infinite extent (I was doing a multiplication for each combination of horizontal position, vertical position and scanline). So I windowed the kernel with a Lanczos window. I got annoying aliasing effects using less than 3 lobes. With 3 lobes it was still too slow because each pixel was a weighted sum of 3-4 separate scanlines. Also, because of the negative lobes I needed extra headroom which meant I either had to reduce my colour resolution or use more than 8 bits per sample (which would also be slow).
The next thing I tried was a Gaussian kernel. This has several nice features:
- The Fourier Transform of a Gaussian is also a Gaussian, which is also a better approximation of a scanline than a rectangle (the focussing of the electron beam isn't perfect, so to a first approximation their distribution around the beam center is normal).
- It dies off much more quickly than the sinc function.
The Gaussian kernel also gave a good image, so I kept it.
The next thing I wanted to do was improve the speed. I still had several scanlines contributing to every pixel. However, that doesn't make much physical sense - the scanlines don't really overlap (in fact there is a small gap between them) so I figured I should be able to get away with only using the highest coefficient that applies to each pixel. I tried this and it worked beautifully - no difference in the image at large sizes and it speed the program up by a factor of several. The downside was at small sizes - the image was too dark. This is because the filter was set up so that each pixel would be the average of several scanlines, but if only one scanline is contributing then then the brightness is 1/several. To fix this I just divided all the coefficients by the largest. There's no mathematical justification for this, but it looks fine (apart from the fact that some of the scanlines don't contribute to the picture at all).
If each pixel is only in one scanline, lots more optimizations are possible - for example, one can generate the image progressively, a scanline at a time, which helps keep data in the caches.
Finally, I still needed it to be faster so I moved all the rescaling (vertical and horizontal) to the GPU. I came up with a devishly clever hack to implement the same scanline algorithm on the GPU. No shader is needed - it can be done just using textures and alpha blending. There are two passes - the first draws the actual video data. The second alpha-blends a dark texture over the top for the scanlines. This texture is 1 texel wide and as many texels high as there are pixels vertically.
One other complication is that I wanted the video data texture to be linearly interpolated horizontally and nearest-neighbour interpolated vertically. This was done by drawing this texture on a geometry consisting of a number of horizontal stripes, each of which has the same v-texture-coordinate at its top as at its bottom.