Bryan O’Sullivan has a beautiful summary of the present state of NVIDIA’s CUDA. He explains the programming model, along with the many different levels of memory and their restrictions (there are many ). I had been quite optimistic in my last post about CUDA (just from taking a quick glance at their source code), but Bryan’s very educated opinion brought me back to earth, as they say. Just a small quote so you can see what I mean exactly:
People with the expertise, persistence, and bloody-mindedness to keep slogging away will undoubtedly see phenomenal speedups for some application kernels. Iâ€™m sure that the DOE and NSA, in particular, are drooling over this stuff, as are the quants on Wall Street. But those groups have a tolerance for pain that is fairly unique. This technology is a long way from anything like true accessibility, even to those already versed with parallel programming using environments like MPI or OpenMP. Still, itâ€™s a great first step.
I have talked to a student of ours who started looking into CUDA some time ago and asked him to compare it to his work on the Cell-Processor (he did an internship at IBM). His comment was (forgot his exact words, but the spirit is kept):
Working on the Cell was soo much easier!
Oh well. Anyways, I wish I had more time to look into this (even reading and understanding the user manual would take hours I just don’t have at the moment). But having the power of a supercomputer under your desk is just sooo tempting…