MPI_ERR_TRUNCATE with Boost.MPI and only blocking communication
I am trying to run some Monte Carlo simulation code I wrote in C++ with MPI via the Boost.MPI wrapper. I use only blocking send and receive calls (i.e. send and recv, never isend or irecv), but after running the program for a few days, I inevitably end up with the following error terminate called after throwing an instance of boost::exception_detail::clone_impl >' what(): MPI_Recv: MPI_ERR_TRUNCATE: message truncated I have seen that this can happen with non-blocking calls are being made, but I cannot see how it can happen with only blocking calls. I am sending and receiving a vector of structs: struct Chain { bool operator==(Chain chain_2); int index; int identity; vector positions; vector orientations; private: friend class boost::serialization::access; template void serialize(Archive& ar, const unsigned int version) { ar& index; ar& identity; ar& positions; ar& orientations; } }; using Chains = vector; where VectorThree is declared elsewhere as class VectorThree { public: VectorThree(int x, int y, int z): m_container {{x, y, z}} {}; VectorThree(): m_container {{0, 0, 0}} {}; VectorThree operator-(); VectorThree operator+(const VectorThree& v_2) const; VectorThree operator-(const VectorThree& v_2) const; bool operator!=(const VectorThree& v_2) const; int& operator[](const size_t& i) { return m_container[i]; }; const int& at(const size_t& i) const { return m_container.at(i); }; VectorThree rotate_half(VectorThree axis); VectorThree rotate(VectorThree origin, VectorThree axis, int turns); VectorThree rotate(VectorThree axis, int turns); int sum(); int abssum(); VectorThree absolute(); VectorThree sort(); private: array m_container; friend class boost::serialization::access; template void serialize(Archive& arch, const unsigned int) { arch& m_container; } }; It sends and receives instances of this object many times before the error occurs. I have narrowed down the calls where the error occurs by printing statements before and after both sending and receiving. The sending code: Chains chains_send {m_us_sim->get_chains()}; cout << "Win " << m_rank << ": Sending chains (size " << chains_send.size() << ") to " << win_i << "\n"; m_world.send(win_i, swap_i, chains_send); cout << "Win " << m_rank << ": Sent chains to " << win_i << "\n"; The receiving code: Chains chains_rec; cout << "Win " << m_rank << ": Recieving chains from " << win_to_win[0] << "\n"; m_world.recv(win_to_win[0], swap_i, chains_rec); cout << "Win " << m_rank << ": Recieved chains (size " << chains_rec.size() << ") from " << win_to_win[0] << "\n"; In the output file before the crash I have Win 0: Recieving chains from 1 Win 1: Sending chains (size 2) to 0 Win 1: Sent chains to 0 I am using version 1.65 of Boost and OpenMPI 2.0.1.
participants (2)
-
Alex Cumberworth
-
Ilja Honkonen