When I started to get deeper into Linux programming, I found it fascinating to learn about the design decisions that were made differently in Windows and Linux, and the history between them.
One example in particular is how dynamic linking is done. Windows dynamic link libraries (DLLs) are compiled and linked to load at a particular address in memory. When a DLL is loaded, the loader inserts the addresses of A DLL can be loaded at a different address (in the case that its preferred address is already in use), but then the loader must relocate it, which is an expensive operation and prevents the DLLs code from being used in other processes. Once loaded, however, code in a DLL is no slower than any other code on the system (calls from one module to another are slightly slower than calls within a module though, since there is a level of indirection involved).
With Linux, on the other hand, the equivalent (shared object files) are compiled to use position independent code, so can be loaded anywhere in memory without relocation. This improves process startup speed at the expense of runtime speed - because absolute addresses cannot be used, in situations where they would otherwise be used the load address must be found and added in.
Another way Linux improves startup speed at the expense of runtime speed is lazy binding of function calls. In a Linux process, a call to a shared library is not normally resolved until the first time it is called. Function pointers initially point to the resolver, and when a function is resolved that pointer is replaced with the found function pointer. This way, the loader doesn't spend any time resolving functions that are never called.
It makes perfect sense that Linux should sacrifice runtime speed for startup speed given the history of Unix. The first versions of Unix had no multithreading - each thread of execution (process) had its own memory space. So you needed to be able to start processes as quickly as possible. With the fork() system call, a new process could be created by duplicating an existing process, an operation which just involved copying some kernel structures (the program's data pages could be made copy-on-write). Because process startup was (relatively) light, and because of Unix's philosophy that a program's responsibilities should be as limited as possible (and that complex systems should be made out of a number of small programs), processes tend to proliferate on operating systems modelled on Unix to a much greater extent than Windows.
However, this does mean that a program developed with Windows in mind (as a single monolithic process) will tend to run faster on Windows than on Linux, and a program developed with Linux in mind (as many small cooperating processes continually being created and destroyed) will tend to run faster on Linux than on Windows.
Another way in which Linux and Windows differ is how they deal with low memory situations. On Linux, a system called the "OOM killer" (Out Of Memory killer) comes into play. The assumption is that if a machine is running too low on memory, some process or other has gone haywire and is using it all. The OOM killer tries to figure out which process that is (based on which processes are using a lot of memory, and which critical system processes are trusted not to go haywire) and terminates it. Unfortunately it doesn't always seem to make the right choice, and I have seen Linux machines become unstable after they run out of memory and the OOM killer kills the wrong thing.
Windows has no OOM killer - it will just keep swapping memory to disk and back until you get bored and kill the offending process yourself or reboot the machine. It's very easy to bring a Windows machine to its knees this way - just allocate more virtual address space than there is physical RAM and cycle through it, modifying each page as rapidly as possible. Everything else quickly gets swapped out, meaning that even bringing up the task manager to kill the program takes forever.