Thinking Parallel

A Blog on Parallel Programming and Concurrency by Michael Suess

Update: A Smart Way to Send a std::vector with MPI – and why it Fails

SatelliteAs many of you know, I always like learning from mistakes. From my own ones, as well as from other people’s. A couple of students of mine tried to send a C++ std::vector using MPI and I was skeptical if their way of doing things was correct. I am now convinced that although it probably works for most implementations today, what they did is a mistake. An interesting one, nevertheless and therefore I am posting it here.

They tried sending a vector like this (details omitted for clarity):

  1. MPI::COMM_WORLD.Send (&vec[0], N, MPI::INT, 0, 0);

And receiving it like this:

  1. vec.resize (N);
  2. MPI::COMM_WORLD.Recv (&vec[0], N, MPI::INT, 1, 0);

See the problem? Sending the vector like this is probably ok, because as far as I remember (and as far as my ever-helpful colleague and ueber-authority 🙂 on C++-issues in our research group Björn is concerned) C++ guarantees that the actual data are stored in a big block that can be accessed just like an array in a vector. And we are doing nothing else here (although beware that depending on your implementation, &vec[0] may be different from &vec – but that has been taken care of in this example).

Receiving the data is an entirely different matter. Apparently, the students knew that the size of a vector is commonly stored inside the object somewhere and that they therefore had to make sure it was the right size before filling it. And they did that correctly by calling resize(). But the problem is that I am fairly sure that there may be other bits of information stored inside a vector-object.

I came up with an example to make this clear: Let’s say my software needs to work in a hostile environment, e.g. space. From time to time, an x-ray would run through my transistors and flip a bit or two (not that I know much about x-rays or how they can affect transistors, but thats not the point). Therefore, the vector-object I am using has the same interface from the outside and conforms to the C++-standard, but has some sophisticated error-detection and recovery mechanism hidden inside. Maybe it is maintaining a hash-value over all elements. Or something like that. When the recv()-operation is called like shown above, the vector elements are not accessed through the correct interface, but rather through whichever mechanism MPI sees fit to use to fill an array. Maybe memcpy(). And therefore my error-correction mechanism will be all messed up after the operation, leaving the vector-object in an inconsistent state. And thats why this way of doing things is wrong.

The point I am trying to make here is: A vector is an object. Like any object, it may have hidden implementation details inside and therefore you must use its interface to change it, like any other object. Counting on object-internals like the fact that it probably has a big block of memory inside to store the actual data will not do, although it may work with your present implementation!

Wow, I really got into preaching mode here :o, sorry about that. And now I should probably really go and find me a copy of the C++-standard to validate all my theories against – and hope that I am able to understand the language they use in there… 😯

[Update]:Thanks a lot for all the comments. I have bought the C++-Standard because I wanted to make sure and I would like to add a few things:

  1. In the version of the 2003 standard I have here it says: “The elements of a vector are stored contiguously” – so that settles that question. This makes my last statement somewhat wrong, this is not an implementation detail exactly as I guessed at the beginning of my article.
  2. Nowhere in the standard (or at least nowhere that I could find – the dang thing is 786 pages long after all) does it say anything about other internal data stored in a vector object. And this makes the main point in the article still valid: if there is a vector implementation that saves more than the size of the vector internally (like the error correction code I mention in my article) it is not save to access the contents of the internal array in the way described. You have to go through the interface of the vector (which, incidentally will be no problem most of the time, because the operator[] is part of the interface). But things like memcpy() probably won’t do – although they will for an array. If anyone has anything substantial to say to counter this point, I would be really interested in your comments!
  3. Thanks for the helpful resources, especially this one from Herb Sutter was very enlightening.
  4. And last but not least: resize() is required, reserve() won’t do (because the size that is stored internally will be wrong if you use reserve()).

Thanks for all your comments, this was very enlightening!

7 Responses to Update: A Smart Way to Send a std::vector with MPI – and why it Fails »»


Comments

  1. Comment by teki | 2007/02/08 at 12:56:36

    “Counting on object-internals like the fact that it probably has a big block of memory inside to store the actual data will not do, although it may work with your present implementation!”

    If it’s a standard compliant STL implementation, then it have to work.

    Recommended reading: http://www.gotw.ca/publications/mill10.htm

  2. Comment by Stefan Eilemann | 2007/02/08 at 12:58:11

    vector.resize can also be terrible slow, since you are inserting n elements into the vector. I discovered that when storing image data in a vector, and sending it over the network as described above. I’m now back using C arrays.

  3. Comment by wreel | 2007/02/08 at 13:05:17

    That’s actually safe. std::vector types are, by design, STL’s interface into legacy C array API calls. The students looked like they did the proper research and it falls inline with Scott Meyers item 16 in “Effective STL”. If Stepanov had his druthers, you would/should have been able to pass vec.begin() to Recv() but quirkiness in certain STL implementations caused for the explicit call for the address of a reference.

  4. Comment by teki | 2007/02/08 at 13:10:34

    Do not use resize, use reserve, that’s what you want.

    The statement of the article is false.
    Recommended readings:
    http://www.gotw.ca/publications/mill10.htm

    An I recommend to read a few book about STL.

  5. Comment by Stuart Dootson | 2007/02/15 at 20:26:07

    Ummmm – I think your students code is reasonably OK. They access the vector contents using the interface only – operator[]. That returns a reference to the first element of the vector contents. Taking it’s address gives a pointer to the first location of the contiguous storage you mentioned. Nowhere are you accessing the *internals* of the vector object. It’s like this (view with a monospace font!):

    +————————-+
    | vector object internals |
    +————————-+
    |
    |
    V
    +—————–+
    | vector contents |
    +—————–+
    ^
    |
    = &vec[0]

    memcpy is fine with a vector (though not idiomatic C++/STL – I tend to use std::copy instead).

    You might like to look at Boost.MPI (http://www.generic-programming.org/~dgregor/boost.mpi/doc/index.html) – a C++ interface to MPI that’s built on top of standard C MPI implementations.

  6. Comment by Mahantesh | 2008/03/12 at 19:32:44

    Thanks! this works for me. I usually do vec1.reserve(N) within try and catch and reserve its place. The next operation will be .resize(N), that way you can hope for O(N) ((will not have to move data around if N is large.

  7. Comment by Lucas Clemente Vella | 2017/10/06 at 23:34:56

    Sorry for necroposting, but I have to strongly of disagree…

    It is impossible to implement the kind of space resistant std::vector you proposed without violating explicitly defined standard interface. For instance, in the following code:

    std::vector v(10);
    v[0] = 42;

    where did vector had the chance to update the internal error correcting code? operator[] must return a reference (as per definition of std::allocator::reference, in pre 2011 C++), not some other object you can attribute and have the value updated inside the vector, so, per definition (even if indirectly), vector’s private data can’t depend on the value of the data stored by the user.

    Secondly, assume space resistant std::vector violated the standard, and instead of a plain reference to T, operator[] returns something to allow for the error correction to work. Now think someone changed the data via:

    *(&v[0]) = 15; // assuming &v[0] still works, because it is no longer a reference!

    what would happen in the next access to v[0]? An exception would be thrown? There is no standard interface to handle the situation, and I am pretty sure implementors are free to change when a standard container can or can not throw an exception… unless it is no longer standard.


Leave a Reply

HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>