Avoid distributed computing unless your code is going to be run by a single client with a lot of available hardware. Being able to snarf up CPU cycles from idle hardware sitting around in the user’s house sounds cool but just doesn’t pay off most of the time.
Avoid GPGPU on the Mac until Snow Leopard ships unless you have a really good application for it. OpenCL will make GPGPU a lot more practical and flexible, so trying to shoehorn your computationally expensive code into GLSL or CoreImage today just doesn’t seem worth it.
Using multiple processes is a good idea if the subprograms are already written. … If you’re writing your code from scratch, I don’t recommend it unless you have another good reason to write subprocesses, as it’s difficult and the reward just isn’t there.
For multithreading, concentrate on message passing and operations. Multithreading is never easy, but these help greatly to make it simpler and less error prone.
Good OO design will also help a lot here. It’s vastly easier to multithread an app which has already been decomposed into simple objects with well-defined interfaces and loose coupling between them.
—Mike Ash (emphasis mine, line-breaks added). The article has more detail and is very much worth reading.
One point that this advice really drives home for me is that you need to focus on making good code first, and defer micro-optimizations. If taking the time to clean up some code makes it easier to parallelize, then you are optimizing your code by refactoring it, even if at a micro-level you might be making some of it slower by, say, not caching something that takes O(1) time to compute.
Apple does not sell a Mac that’s not multi-core, and even the iPhone has a CPU and a GPU. There’s no question that optimization means parallelization. And all signs point to computers getting more parallel in the future. Any optimization that hurts parallelization is probably a mistake.