Re: [Boost-users] Hybrid parallelism, no more + mpi+serialization, many questions

18 Nov 2010


      This partly depends on how many processors/machines you have
available.  You need to find a way of partitioning your state space
into tasks, and then doling those tasks out to processes/threads.  How
expensive is f()?  How much memory is used?

  Brian

On Thu, Nov 18, 2010 at 3:00 AM, Hicham Mouline <hicham@mouline.org> wrote:
...
...
-----Original Message-----
From: boost-users-bounces@lists.boost.org [mailto:boost-users-
bounces@lists.boost.org] On Behalf Of Matthias Troyer
I would go with reading once and broadcasting, especially if, as was
mentioned before, one aims at going to thousands of processes. No I/O
system can scale, and implementing the broadcast is trivial: a single
function call.
Matthias
_______________________________________________
The large calculation that I currently do serially and that I intend to
parallelize is the maximum of the return values of a large number of
evaluations of a given "function" in the mathematical sense.
The number of arguments of the function is only known at runtime.
Let's say it is determined at runtime that the number of arguments is 10, ie
we have 10 arguments x0, ..., x9
Each argument can take a different number of values, for e.g. x0 can be
x0_0, x0_1 .... x0_n0
x1 can be x1_0, x1_1, ...x1_n1 and so on...n0 and n1 are typically known at
runtime and different
so serially, I run
f(x0_0, x1_0, ..., x9_0)
f(x0_0, x1_0, ..., x9_1)
...
f(x0_0, x1_0, ..., x9_n9)
then with all the x8 then all the x7 ... then all the x0.
There is n0*n1*...*n9 runs
Then I get the maximum of the return values.
Imagining I have N mpi processes, ideally each process would run
n0*n1*...*n9/N function evaluations.
How do I split?
In terms of current implementation, each of the x is a boost::variant over 4
types:
a double, a <min,max> pair, a <min, max, increment> triplet or a
vector<double>
A visitor is applied recursively to the variants in order to traverse the
whole parameter space.
apply_visitor on x0 => say if x0 is a triplet, then
for (x0 from min to max with increment)
  apply_visitor on x1
and so on until x9, then we actually call the f function with all the
arguments collected so far.
How can one parallelize such a beast?
rds,
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users

Re: [Boost-users] Hybrid parallelism, no more + mpi+serialization, many questions

Brian Budge