Thinking Parallel

A Blog on Parallel Programming and Concurrency by Michael Suess

Is the Multi-Core Revolution a Hype?

HypeMark Nelson does not believe in the hype about multi-cores. And he is right with several of his arguments. The world is not going to end if we cannot write our applications to allow for concurrency, that’s for sure. Since I am working on parallel machines all day, it is easy to become a little disconnected with the real world and think everybody has gotten the message and welcomes our new parallel programming overlords. Some of Marks arguments are a little shaky, though, as I hope to show you in this article. Is Mark right? I suspect not, but only time will tell.

Let’s go through his arguments one by one (for this, it helps if you read the article in full first, as my argument is harder to understand without the context).

Linux, OS/X, and Windows have all had good support for Symmetrical Multiprocessing (SMP) for some time, and the new multicore chips are designed to work in this environment.

I completely agree here, the problem is not on the operating system’s side of the equation.

Just as an example, using the spiffy Sysinternals Process Explorer, I see that my Windows XP system has 48 processes with 446 threads. Windows O/S is happily farming those 446 threads out to both cores on my system as time becomes available. If I had four cores, we could still keep all of them busy. If I had eight cores, my threads would still be distributed among all of them.

This argument I don’t understand. He claims that he has enough threads on his system to keep even a four core system busy. Yet, at the same time the CPU-monitor he depicts shows a CPU-Usage of merely 14.4% – which proves that those threads are not really doing anything useful most of the time. Most of them are sleeping and will therefore not be a burden on the CPU anyways. As I see it, Marks picture shows that he has by far not enough tasks to be done on his system to keep even his dual-core system from going into power-saving mode. It’s not how many threads there are, it’s how much they do that’s important.

Modern languages like Java support threads and various concurrency issues right out of the box. C++ requires non-standard libraries, but all modern C++ environments worth their salt deal with multithreading in a fairly sane way.

Right and wrong, if you ask me. The mainstream languages do have support for threads now. Whether or not that support is sane is another matter, altogether. :smile: I know one thing from looking at my students and my own work: parallel programming today is not easy and it’s very easy to make mistakes. I welcome any effort to change this situation with new languages, tools, libraries or whatever magic available.

The task doing heavy computation might be tying up one core, but the O/S can continue running UI and other tasks on other cores, and this really helps with overall responsiveness. At the same time, the computationally intensive thread is getting fewer context switches, and hopefully getting its job done faster.

That’s true. Unfortunately, this does not scale, since we have already seen in the argument presented above, that all the other threads present can run happily on one core, without it even running hot. Nobody says that you need parallel programming when you have only two cores. But as soon as you have more, I believe you do.

In this future view, by 2010 we should have the first eight-core systems. In 2014, we’re up to 32 cores. By 2017, we’ve reached an incredible 128 core CPU on a desktop machine.

I can buy an eight-core system today, if I want to. Intel has a package consisting of two quad-core processors and a platform to run them on. I am sure as soon as AMD gets their act together with their quad-cores, they will follow. I am not so sure anymore when the first multi-cores where shipped, but this press-release suggests it was about two years ago in 2005. I can buy eight cores now from Intel. Or I can buy chips from Sun with eight cores supporting eight threads each. My reader David points me to an article describing a new chip with 64 cores. Does this mean, that the cores today are going to double each year? When you follow this logic, we are at 64 cores in 2010. The truth is probably somewhere in the middle between Mark’s and my prediction, but I am fairly sure the multi-core revolution is coming upon us a lot faster than he is predicting…

He also pointed out that even if we didn’t have the ability to parallelize linear algorithms, it may well be that advanced compilers could do the job for us.

Obviously, this has not quite worked out as good as expected.

Maybe 15 or 20 years from now we’ll be writing code in some new transaction based language that spreads a program effortlessly across hundreds of cores. Or, more likely, we’ll still be writing code in C++, Java, and .Net, and we’ll have clever tools that accomplish the same result.

I sure hope he is right on this one. Or on a second thought, maybe I prefer a funky new language with concurrency support built in, instead of being stuck with C++ for twenty more years. :razz:

You have heard my opinion, you have read Mark’s, what’s your’s? Is the Multi-Core Revolution a Hype? Looking forward to your comments!

13 Responses to Is the Multi-Core Revolution a Hype? »»


  1. Comment by David | 2007/08/22 at 03:29:44

    Thanks for the article, good discussion.

    In my opinion, we have mainstream, general purpose, (C-based) programmable processors with 128 cores. Should have 256-512 of ‘em by the launch of G92 this year.

    What’s important for me is to think about which data-parallel operations/algorithms will constitute the majority of consumer/scientific demand, and how well these map to hardware limitations in terms of hardware latency and thread management.

    Both nVidia and AMD have
    CPU-GPU “fusion” processors in their future. Whether these materialize as simply low-latency, high-bandwidth interconnects between two separate entities, or as specialized SIMD hardware built on the “CPU” itself, it doesn’t change the fact that the algorithms have to match the hardware capabilities. Here’s the top applications that come to my mind which would immediately benefit from more than a few cores:

    “Mainstream” Apps
    - 3D gaming/rendering
    - Video Rendering/Encoding/Decoding
    - Physics calculations for hundreds/thousands of objects
    - And all server-side programming ;)

    “Scientific” Apps
    - Medical image reconstruction/registration/visualization
    - FEM (fluid dynamics, mechanics, thermodynamics)
    - FFT
    - Linear Algebra
    - Monte Carlo

    I’d be interested in hearing from others what other “mainstream” applications they think are either out there now or on the horizon.

  2. Comment by Mark Nelson | 2007/08/22 at 14:15:45

    Hi Michael,

    I don’t really have any fight with anything you’ve said here. I will quibble a bit on this:

    >can buy an eight-core system today, if I want to.

    Yes, you can, but when examining the growth in the number of cores, I think it’s best to look at the fat part of the market rather than outlying values. Right now the *vast* majority of multicore CPUs chipped are dual core. What do you guess, maybe 99%?

    So I think it’s reasonable that if we are tracking the growth of N, right now N=2. I don’t think N will be 4 for at least another 12 months.

    As for my system running at 14%, true, it wasn’t busy when I took the snapshot. I was more interested in the thread count. When my system is actually humming, like in the middle of a big build, it is true that I will usually only see a few threads maxing out their CPU time. I think that as long as that number hovers around the range of N, things are still working okay with the old paradigms.

    A lot of people are missing one point I tried make, which is this: once N gets big, we will need new paradigms. I just don’t know how big it has to be before we start having real trouble.

    It is only my personal opinion, but I don’t think we (human brains) are good at parallelization, so I think machines are going to do it. (I don’t know how.) If that effort fails, CPU makers will basically be stuck – adding additional cores won’t help system performance, so they’ll have to return to enhancing performance the way they’ve done for the last 30 years – faster clocks, more powerful instruction sets, better pipelining, etc. If that doesn’t work, we’ve hit a wall.

    And to reiterate, I see the wall coming, I just don’t think it’s that close yet. So when I say “Don’t Panic”, don’t confuse that with “Don’t do anything.”

  3. JP
    Comment by JP | 2007/08/22 at 17:27:11

    I think server side application, games and scientific applications already cope quite well with many cores. The biggest challenge I see is to get standard applications such as Browser, Office, IDE, Mail reader etc to benefit from many cores. These are the application that eat most of my desktop CPU cycles — and being a human I focus on a single application at a time (which I want to run at maximum speed), not on 4 or 8. So being able to run 4 or 8 CPU-intensive apps concurrently is nice, but not what I need most of the time.

    Especially on OS like Windows, which has the notion of an ‘UI thread’ which needs to handle all window messages it is really hard for such an application to distribute work among many threads. Sure you can spin of worker threads doing I/O etc. but sooner or later you have to update the UI and have to synchronize with the UI thread again, which can easily become quite tedious. Thus I currently do not see a generally applicable, well-scaling model for these applications to benefit from many cores.

  4. Comment by David | 2007/08/23 at 07:13:15


    >>Thus I currently do not see a generally applicable, well-scaling model for these applications to benefit from many cores.

    I’m certainly with you on this, at least when only single instances of these apps are running. Considering that hardware CPU virtualization is now ubiquitous, and OpenGL/DirectX virtualization is coming online, I wouldn’t be surprised if application servers for the home and office gain traction quickly. Who’d need actual thin clients…just KVM over wireless/ethernet and each new monitor buys you a new “computer”.

  5. Comment by David | 2007/08/27 at 20:02:45
  6. Comment by Chris O | 2007/10/16 at 05:37:08

    15 years ago when I last studied this in any detail, you could get 8 or 16 CPUs at best for an SMP system because the overhead of cache coherency between a larger number of processors begins to overtake any theoretical performance gains of having more processors.

    Why don’t the CPUs with 32+ cores have cache coherency problems? I’m guessing that they must have greatly simplified caching schemes. Still the GPUs feel more like very fancy microcontrollers as opposed to general-purpose microprocessors. The GPUs are still rather specialized after all.

    Yes I do believe that multicores have a lot of hype, many people are clamoring to make multithreading as “a mountain out of a molehill”. Sometimes it feels like multithreading is a solution just waiting for a problem to be solved. Sure the wall has been hit because the tricks and advances of the last 30 years (faster single CPU) can no longer occur and the common CPU must now have multiple cores.

    That being said, it’s nice to have a machine with more than one processor. However, I/O contention and virtual memory thrashing will always any benefits from having extra processor(s).

    On a Windows machine, the possibilities for parallelism are rather limited but I think this makes the parallelism game much simpler to play, so for Windows you really only have these cases where it is proper to use parallelism:
    1) You want to run a long operation without holding up the UI thread (this makes sense for both single and SMP systems)
    2) You must communicate with another process (again applies to both single and multi)
    3) You have some large amount of data processing that could fit into a scalable solution (applies to multiprocessor of course)

    Regardless of the reason involved for multithreading/parallelism, I believe the programmer is ultimately responsible for knowing what states must be shared across different threads. Use of general-purpose libraries can solve some but not all of these problems, and of course, these libraries will incur their own limitations as well.

    I suppose this blog is mostly about #3, so the options for that are:
    a) SIMD
    b) OpenMP
    c) DIY: create 1 thread / processor that does the heavy computing, these can of course use SIMD, but the coder gets to split up the work/data involved

    To further complicate things, the quad Xeon chip, for example, has two separate dual-core dies next to each other in the same CPU package. So do we expect intra-CPU communication to be slower when moving off die? The many types of hardware flavors would wreak havoc for designing general purpose solutions for large data, scalable processing. Can the compiler or run-time actually minimize CPU waits caused by caches misses and/or memory synchronizations (flush caches)? (And to complicate even more, we could consider NUMA systems as well).

    David: thanks for posting about the applications, that’s an excellent overview.

    Michael: since you’re the academic, what is the true cost of thread synchronization? For example on an 8 CPU system, suppose CPUs 1 and 5 have their caches accessing the same memory. CPU 1 hits a lock and so must synchronize with CPU 5′s cache, i.e. they both must flush their caches? Is this correct? Is there a way to see cache misses in action, there doesn’t seem to be a perfmon counter for this?

    Many thanks for the great blog!

    -Chris O

  7. Comment by Angelo Pesce | 2007/12/26 at 21:45:47

    Mhm, I think that you are 100% right Michael.

    Maybe it’s true that we most programmers are not going to face the problem right now. And surely is true that up to a limit, having multiple cores can be automatically handled by the OS to do things in background or to be more responsive.

    But still, it’s true that what is really matter is not how many threads do you have, but how many are you using at a time. Most of the time you are only running one application that is consuming all of your CPU and it does not matter that you have 10 threads in a background service at all. So that remark was simply stupid.

    And the main problem is not that we have to face now with 2-4-8 cores, but that the tendency is to have more cores (and this is a fact) faster than we have adeguate solutions to multithreading (c++ is not ready, java neither)

    And right now I’m working in the videogame industry and right now I have to face with 6-8 hardware threads (x360-ps3), and right now most videogame developers are trying really hard to refactor their engines to gain advantage of that, so it’s not something that is going to be ready for some distant future!

    The only place where mulithreading is easy as of now is in numerical kernel codes that can be mapped to a stream programming model and that’s why GPUs are so fast, but I won’t call them a general strategy to deal with the problem at all!!!

  8. Comment by Stefan | 2008/03/01 at 23:35:52

    What to call a “crisis”

    My first question is who cares about that multicore crisis? Should I care? And if, why?
    We are talking about a problem software developers have or maybe will have (oh yes, WE already had the crisis in 1999). All developers? Me? No. Not me. Why?

    Let me go into greater detail to explain what I mean. As cpus became faster and faster new CASE tools became available that allowdevelopers using more and more abstract designs for application development. Those tools create code, integrate libraries and put general purpose code into your application. The effect is a shorter time to market paid with more calculation overhead.

    An example: Create a hello world application in 1970, 1990 and 2008. They all will say “Hello world” but the first will run 5000 cpu cycles, the second 500,000 and the last more than 10,000,000 cycles. The hello world app of 2008 won’t be slower though. Why? Because the cpu it will run on is more than 10,000 times faster that that of 1970.

    Now back to the crisis. My problem is that the application I just created dosn’t meet the performance requirements on the target architecture. A faster cpu isn’t available and it doesn’t scale in a multicore environment. You can call that a multicore crisis if you want. Why not call it an overhead crisis?

    There is so called redshifting business, right. Google, Ebay and the military won’t solve their capacity problems by going back to assembler programming. But as smoothspan wrote in his blog in 2007 – they already have a solution! So they aren’t hit by the crisis right now and won’t be in the nearer future.

    Maybe the mid-sized software companies are. Those that deliver 95% of the applications in use by 95% of the people. Those that deliver applications that don’t scale well in multicore environments. Those that use the tools that generate code that keeps a cpu busy with overhead most of the time. Runtime analysis of legacy software systems (that, in my opinion, make up more than 50% of all software systems) shows that a cpu spends most of its time in calculating things the programmer has no idea of. Listen to them, they will tell you something like “what the hell is …” or “which crazy bastard created that routine?”.

    Back to my optimization problem. I don’t have the knowledge and skills to transform my algorithms in well scaling parallel ones. Maybe I get it running on two cpus, maybe on four. But N is unreachable. But if I had, I wouldn’t do it anyways. This is mainly a question of my application doing the right things rather than doing the (wrong) things right (in this case: faster).

    As long as applications are as described there will be no multicore crisis. Today’s business applications don’t perform operations 10,000 times more complex than those of 1970. Because today’s cpus deliver 10,000 times more calculation power there is no need for more cpus. After that it is all about a cost problem. The question is: Is parallel programming cheaper than creating less redundant code?

  9. Comment by Squirrl | 2008/04/01 at 23:02:36

    Just read the article:

    The real problem exists in determining the critical section of your source code. Game Developer or Hobiest, you still have to deal with splitting up what parts you want to run where. As of now I don’t know of any smart scheduler that will handle that for you.

    Recalling what I read of PS3 developers, they had to either run sections of sound, video, mesh parsers on individual cores or designate critical sections to some sort of scheduler.

    Nobody really knows what will work best. It takes development time that nobody seems to have anymore. It’s all about release dates. But that is a story for another time.

    I have a dual-core 32-bit intel laptop. My real problem with the industry is that my parents’ single core 32/64-bit amd desktop leaves it behind as one would a bad dream.

    What happens is if you want real performance then you need real tools. Logic sims for the processor and assembler commands “reference code, docs, .
    You really need to know the architecture you are working on. Registers if you will. Think about the Jaguar, Atari developers. They knew the processors and interleave times. Once you have all that information gathered you can start programming for your then out of date processor which is another problem with the industry.

Trackbacks & Pingbacks »»

  1. [...] Is the Multi-Core Revolution a Hype? » This Summary is from an article posted at Thinking Parallel | A Blog on Parallel Programming and [...]

  2. [...] Michael Suess contributes to an online discussion about whether multi-core processors are creating a software “crisis” or just one more milestone along the way. Mark Nelson does not believe in the hype about multi-cores. And he is right with several of his arguments. The world is not going to end if we cannot write our applications to allow for concurrency, that’s for sure. Since I am working on parallel machines all day, it is easy to become a little disconnected with the real world and think everybody has gotten the message and welcomes our new parallel programming overlords. Some of Marks arguments are a little shaky, though, as I hope to show you in this article. Is Mark right? I suspect not, but only time will tell. [...]

  3. [...] Law to delivering ever more processor cores on the same chip.  The crisis comes about because its much harder to write truly parallel software than it is to just let the chip get faster and run conventional software twice as fast every 18-24 [...]

  4. [...] add another point of view I suggest that you check out Michael Suess’s thinking Parallel blog that pretty much goes point by point through Marks original thesis. I agree with both of them that [...]

Leave a Reply

HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>