Re: [Boost-users] hybrid parallelism

4 Nov 2010

      Do these tasks share a lot of data?  If they are really lightwieght
memory-wise, heavy computationally, and don't require fine-grained
communication with each other, I'd go with David's suggestion, as it
will be easier to write, and the performance won't be much different.

If you use a lot of memory, need fine-grained chatter between tasks,
or the tasks are pretty cheap, threads may be (much) better.

  Brian

On Wed, Nov 3, 2010 at 5:19 PM, Hicham Mouline <hicham@mouline.org> wrote:
...
...
-----Original Message-----
From: boost-users-bounces@lists.boost.org [mailto:boost-users-
bounces@lists.boost.org] On Behalf Of Dave Abrahams
Sent: 03 November 2010 23:54
To: boost-users@lists.boost.org
Subject: Re: [Boost-users] hybrid parallelism
...
Hi Hicham -
Yes, you can use MPI (possibly through boost::mpi) to distribute
tasks
to multiple machines, and then use threads on those machines to work
on finer grained portions of those tasks.  From another thread on
On Thu, Nov 4, 2010 at 8:16 AM, Brian Budge <brian.budge@gmail.com>
wrote:
this
...
list, there are constructs in boost::asio that handle task queuing
for
the thread tasks.
If I were you I would start by trying to do this with N processes per
machine, rather than N threads, since you need the MPI communication
anyway.
--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
_______________________________________________
Just temporarily? You would still after that add a layer of multithreading
to each process, and have only 1 process per machine, after that, no?
A 1 process N threads in 1 machine is probably better total wall time than
just N mono threaded processes because of the no need to duplicate the input
memory to the tasks.
The question I really wanted to ask about is that I expect to have M*N
outstanding threads (M computers, N threads in each process) just sitting
there waiting for jobs.
Then from the user interface, I click and that starts 100000 tasks, then it
is spread all over the M machines and N threads in each process. Then result
comes back, displayed...
Then user clicks again and same thing happens.
You're saying this is doable with Boost.MPI + MPI impl?
I wasn't expecting to divide the tasks into finer grained ones. All the
tasks are atomic and have about the same exec time. It's rather pass
100000/M tasks to each machine, then divide this number by N for each thread
in that process. This last bit is up to me to code.
Ideally, the task is just a functor with operator() member and M machines
and N threads are treated similarly. I guess it's up to me to write some
abstraction layer to view the whole M*N in a flat way.
Other questions, more architectural in nature, I'm not sure they are best
asked here?
Regards,
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users