Re: [Boost-users] Hybrid parallelism, no more + mpi+serialization, many questions
hello,Subsequent to a previous thread asking whether to merge MPI and Openmp to parallelize a large problem, I've been advised to go through MPI only as it would be simpler and that MPI implementations on the same box use shared memory which doesn't have a huge cost (still some compared to a uniprocess multithread where objects are actually shared naturally.... writing this, actually a question comes up:1. in the "shared memory" of many mpi processes on the same box, is an object (say a list of numbers) actually shared between the 2 processes address spaces? I guess not unless one explicitly make it so with the "shared memory API" (unix specific?)So, I currently have a serial application with a GUI that runs some calculations.My next step is to use OpenMPI with the help of the Boost.MPI wrapper library in C++ to parallelize those calculations.There is a set of static data objects created once at startup or loaded from files.2. what are the pros/cons of loading the static data objects individually from each separate mpi process vs broadcasting the static data via MPI itself after only the master reads/sets up the static data?3. Is it possible to choose the binary archive instead of the text archive when serializing my user-defined types?Where do I deal with the endianness issue given that I may have Intel/Sparc/PowerPC CPUs?regards,
On Tue, Nov 16, 2010 at 10:22 AM, Hicham Mouline
hello,
Subsequent to a previous thread asking whether to merge MPI and Openmp to parallelize a large problem, I've been advised to go through MPI only as it would be simpler and that MPI implementations on the same box use shared memory which doesn't have a huge cost (still some compared to a uniprocess multithread where objects are actually shared naturally.... writing this, actually a question comes up: 1. in the "shared memory" of many mpi processes on the same box, is an object (say a list of numbers) actually shared between the 2 processes address spaces? I guess not unless one explicitly make it so with the "shared memory API" (unix specific?)
In MPI, each process has access only to the memory that it directly controls, and data must be explicitly transferred between processes, even if that memory is physically shared. If you break that model, you are playing with fire.
So, I currently have a serial application with a GUI that runs some calculations. My next step is to use OpenMPI with the help of the Boost.MPI wrapper library in C++ to parallelize those calculations. There is a set of static data objects created once at startup or loaded from files.
2. what are the pros/cons of loading the static data objects individually from each separate mpi process vs broadcasting the static data via MPI itself after only the master reads/sets up the static data?
It is easier to load them from disk on each process (you don't have to deal with serialization and passing the structure). Typically you will not see a performance problem if this is only a one-time startup cost and if you are not loading massive data files from a file system with weak IO capabilities onto very many MPI processes.
3. Is it possible to choose the binary archive instead of the text archive when serializing my user-defined types? Where do I deal with the endianness issue given that I may have Intel/Sparc/PowerPC CPUs?
Not sure how boost::serialization handles that one... There are probably compiler flags that you can set to change endian-ness if needed though.
regards,
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
At Tue, 16 Nov 2010 12:41:08 -0700, James C. Sutherland wrote:
Where do I deal with the endianness issue given that I may have Intel/Sparc/PowerPC CPUs?
Not sure how boost::serialization handles that one... There are probably compiler flags that you can set to change endian-ness if needed though.
IIUC MPI, and thus Boost.MPI, handles it for you transparently. -- Dave Abrahams BoostPro Computing http://www.boostpro.com
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users- bounces@lists.boost.org] On Behalf Of David Abrahams Sent: 16 November 2010 20:19 To: boost-users@lists.boost.org Subject: Re: [Boost-users] Hybrid parallelism, no more + mpi+serialization, many questions
At Tue, 16 Nov 2010 12:41:08 -0700, James C. Sutherland wrote:
Where do I deal with the endianness issue given that I may have Intel/Sparc/PowerPC CPUs?
Not sure how boost::serialization handles that one... There are
compiler flags that you can set to change endian-ness if needed
probably though.
IIUC MPI, and thus Boost.MPI, handles it for you transparently.
I'm a bit unclear. MPI uses serialization to serialize user-defined types (you write the serialize template function). I don't know if MPI lets you choose if you want a binary archive or a text/xml archive. If you can choose the binary archive, wouldn't the issue then be with serialization and not MPI? What about primitive types like a double? This will be a quick test. regards,
On Tue, Nov 16, 2010 at 9:52 PM, Hicham Mouline
MPI uses serialization to serialize user-defined types (you write the serialize template function). I don't know if MPI lets you choose if you want a binary archive or a text/xml archive.
Boost.MPI chooses the archive: its own specialized archive version. In general, with Boost.Serialization, the serialization functions you write are independent of the archive: they work equally well with a text/xml archive than with a binary archive. Boost.MPI exploits this and defines its own archive types to translate your classes into something that (C level) MPI can handle. You can influence the process to gain some more speed with types that directly map to MPI types, see the Boost.MPI manual: http://www.boost.org/doc/libs/1_44_0/doc/html/mpi/tutorial.html#mpi.performa... Best regards, Riccardo -- Riccardo Murri Grid Computing Competence Centre, http://www.gc3.uzh.ch/ Organisch-Chemisches Institut, University of Zurich Winterthurerstrasse 190, CH-8057 Zürich (Switzerland) Tel: +41 44 635 4222 Fax: +41 44 635 6888
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users- bounces@lists.boost.org] On Behalf Of Riccardo Murri Sent: 16 November 2010 21:08 To: boost-users@lists.boost.org Subject: Re: [Boost-users] Hybrid parallelism, no more + mpi+serialization, many questions
On Tue, Nov 16, 2010 at 9:52 PM, Hicham Mouline
wrote: MPI uses serialization to serialize user-defined types (you write the serialize template function). I don't know if MPI lets you choose if you want a binary archive or a text/xml archive.
Boost.MPI chooses the archive: its own specialized archive version.
In general, with Boost.Serialization, the serialization functions you write are independent of the archive: they work equally well with a text/xml archive than with a binary archive. Boost.MPI exploits this and defines its own archive types to translate your classes into something that (C level) MPI can handle.
You can influence the process to gain some more speed with types that directly map to MPI types, see the Boost.MPI manual:
http://www.boost.org/doc/libs/1_44_0/doc/html/mpi/tutorial.html#mpi.per formance_optimizations
Best regards, Riccardo -- Therefore endianness and bitness is not an issue even for primitive types inside complex user-defined types,
For e.g. struct my_type { double d1; int d2; }; then I write the templated serialize() function for my_type. MPI should be able to send(after serialization) an instance of my_type from Intel to a Sparc box where the instance is deserialized and the object will be properly constructed (d1 will be correct) cool,
On 16 Nov 2010, at 20:41, James C. Sutherland wrote:
On Tue, Nov 16, 2010 at 10:22 AM, Hicham Mouline
wrote: 2. what are the pros/cons of loading the static data objects individually from each separate mpi process vs broadcasting the static data via MPI itself after only the master reads/sets up the static data?
It is easier to load them from disk on each process (you don't have to deal with serialization and passing the structure). Typically you will not see a performance problem if this is only a one-time startup cost and if you are not loading massive data files from a file system with weak IO capabilities onto very many MPI processes.
I would go with reading once and broadcasting, especially if, as was mentioned before, one aims at going to thousands of processes. No I/O system can scale, and implementing the broadcast is trivial: a single function call. Matthias
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users- bounces@lists.boost.org] On Behalf Of Matthias Troyer I would go with reading once and broadcasting, especially if, as was mentioned before, one aims at going to thousands of processes. No I/O system can scale, and implementing the broadcast is trivial: a single function call.
Matthias
_______________________________________________
The large calculation that I currently do serially and that I intend to
parallelize is the maximum of the return values of a large number of
evaluations of a given "function" in the mathematical sense.
The number of arguments of the function is only known at runtime.
Let's say it is determined at runtime that the number of arguments is 10, ie
we have 10 arguments x0, ..., x9
Each argument can take a different number of values, for e.g. x0 can be
x0_0, x0_1 .... x0_n0
x1 can be x1_0, x1_1, ...x1_n1 and so on...n0 and n1 are typically known at
runtime and different
so serially, I run
f(x0_0, x1_0, ..., x9_0)
f(x0_0, x1_0, ..., x9_1)
...
f(x0_0, x1_0, ..., x9_n9)
then with all the x8 then all the x7 ... then all the x0.
There is n0*n1*...*n9 runs
Then I get the maximum of the return values.
Imagining I have N mpi processes, ideally each process would run
n0*n1*...*n9/N function evaluations.
How do I split?
In terms of current implementation, each of the x is a boost::variant over 4
types:
a double, a
This partly depends on how many processors/machines you have
available. You need to find a way of partitioning your state space
into tasks, and then doling those tasks out to processes/threads. How
expensive is f()? How much memory is used?
Brian
On Thu, Nov 18, 2010 at 3:00 AM, Hicham Mouline
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users- bounces@lists.boost.org] On Behalf Of Matthias Troyer I would go with reading once and broadcasting, especially if, as was mentioned before, one aims at going to thousands of processes. No I/O system can scale, and implementing the broadcast is trivial: a single function call.
Matthias
_______________________________________________
The large calculation that I currently do serially and that I intend to parallelize is the maximum of the return values of a large number of evaluations of a given "function" in the mathematical sense. The number of arguments of the function is only known at runtime. Let's say it is determined at runtime that the number of arguments is 10, ie we have 10 arguments x0, ..., x9 Each argument can take a different number of values, for e.g. x0 can be x0_0, x0_1 .... x0_n0 x1 can be x1_0, x1_1, ...x1_n1 and so on...n0 and n1 are typically known at runtime and different
so serially, I run f(x0_0, x1_0, ..., x9_0) f(x0_0, x1_0, ..., x9_1) ... f(x0_0, x1_0, ..., x9_n9) then with all the x8 then all the x7 ... then all the x0. There is n0*n1*...*n9 runs Then I get the maximum of the return values.
Imagining I have N mpi processes, ideally each process would run n0*n1*...*n9/N function evaluations. How do I split?
In terms of current implementation, each of the x is a boost::variant over 4 types: a double, a
pair, a triplet or a vector<double> A visitor is applied recursively to the variants in order to traverse the whole parameter space. apply_visitor on x0 => say if x0 is a triplet, then for (x0 from min to max with increment) apply_visitor on x1 and so on until x9, then we actually call the f function with all the arguments collected so far. How can one parallelize such a beast?
rds,
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (6)
-
Brian Budge
-
David Abrahams
-
Hicham Mouline
-
James C. Sutherland
-
Matthias Troyer
-
Riccardo Murri