Do these tasks share a lot of data? If they are really lightwieght
memory-wise, heavy computationally, and don't require fine-grained
communication with each other, I'd go with David's suggestion, as it
will be easier to write, and the performance won't be much different.
If you use a lot of memory, need fine-grained chatter between tasks,
or the tasks are pretty cheap, threads may be (much) better.
Brian
On Wed, Nov 3, 2010 at 5:19 PM, Hicham Mouline
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users- bounces@lists.boost.org] On Behalf Of Dave Abrahams Sent: 03 November 2010 23:54 To: boost-users@lists.boost.org Subject: Re: [Boost-users] hybrid parallelism
Hi Hicham -
Yes, you can use MPI (possibly through boost::mpi) to distribute tasks to multiple machines, and then use threads on those machines to work on finer grained portions of those tasks. From another thread on
On Thu, Nov 4, 2010 at 8:16 AM, Brian Budge
wrote: this list, there are constructs in boost::asio that handle task queuing for the thread tasks.
If I were you I would start by trying to do this with N processes per machine, rather than N threads, since you need the MPI communication anyway.
-- Dave Abrahams BoostPro Computing http://www.boostpro.com _______________________________________________
Just temporarily? You would still after that add a layer of multithreading to each process, and have only 1 process per machine, after that, no?
A 1 process N threads in 1 machine is probably better total wall time than just N mono threaded processes because of the no need to duplicate the input memory to the tasks.
The question I really wanted to ask about is that I expect to have M*N outstanding threads (M computers, N threads in each process) just sitting there waiting for jobs. Then from the user interface, I click and that starts 100000 tasks, then it is spread all over the M machines and N threads in each process. Then result comes back, displayed... Then user clicks again and same thing happens.
You're saying this is doable with Boost.MPI + MPI impl?
I wasn't expecting to divide the tasks into finer grained ones. All the tasks are atomic and have about the same exec time. It's rather pass 100000/M tasks to each machine, then divide this number by N for each thread in that process. This last bit is up to me to code. Ideally, the task is just a functor with operator() member and M machines and N threads are treated similarly. I guess it's up to me to write some abstraction layer to view the whole M*N in a flat way.
Other questions, more architectural in nature, I'm not sure they are best asked here?
Regards,
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users