I was thinking some more about this recently. Specifically, APIs are generally implemented as DLLs or shared libraries which you load into your process and which expose functions that can simply be called. But in a world with no DLLs, how do you do API calls?
At a fundamental level, there are two things you can do - write to some memory shared by another process and make system calls. The former is useful for setting up calls but not for actually making them, as the process implementing the API has no way to notice when the memory changes (unless it polls, which is not a good solution). But system calls are expensive - the context switch alone could be many thousands of CPU cycles.
I mentioned in the linked post that bandwidth for inter-process communication should not be a problem but I neglected to mention latency - if you can only make API calls a few tens of thousands of times per second, certain operations could become very slow.
However, the latency of any single call is still far below what could be noticed by a human being - the only perceived speed problems caused by this architecture would be if many thousands of API calls were made as a result of a single user action.
I think the answer is to design APIs that aren't "chatty". Such designs are already in common use in database situations, where high-latency APIs have been the rule for a long time. Instead of having getCount() and getItemAtIndex() calls to retrieve a potentially large array of data, you retrieve a "cursor" which can return many records at once.
Another possibility is for APIs themselves to be able to execute simple programs. Again this is an idea used by databases (SQL, the syntax usually used to access databases, is itself a language in its own right). Such programs should not be native code (since that gives all the same problems as DLLs, just backwards) but can be interpreted or written in some sort of verifiable bytecode (which could even be JIT compiled for some crazy scenarios). The language they are written in need not even be Turing-complete - this is just an optimization, not a programming environment.
If all else fails, there is still another way to get low-latency, high-bandwidth communication between chunks of code which don't know about each others' existence, and that is to create a whole new process. The involved APIs return a chunk of code (perhaps in the form described here) and the calling program compiles these (and some code of its own) into a (temporary) chunk of binary code which then gets an entirely new process to execute in. This moves potentially all of the IPC costs to a one-time startup phase. The resulting design resembles Java or .NET in some respects.
This is kind of like the opposite of the Singularity OS - reliability guarantees enforced by the compiler, but the OS allows you to use all the user-mode features of your CPU (not just the ones supported by your runtime) and processes are isolated by the MMU.
[...] or hardware releases, the string-based method is the way to go. Especially for something like this. Keeping track of all the possible variations of the mapping between method names and integers is [...]
[...] One problem with this scheme is that the time it takes to do a system call is multiplied by the number of subversion layers it goes through. One can’t use in-process calls because then the malware would be able to examine and modify the system calls by reading and writing its own process memory. So one would need to use the techniques described here. [...]