two boost.threads taking much longer than one
Hi everyone, I've got an operation which takes 14 seconds on a single thread, but is taking up to 45 seconds when running via two boost.threads on a dual proc machine (2x 3ghz Xeons, Fedora Core 1). The thread functor objects are being passed a lot of references (about 20) to some very large std::vectors. One thread always reads/writes from the start of the vectors to their midpoint, and the second thread from (midpoint+1) to the end of the vectors. The only thing I can think of is that the threads are taking a long time to fork and get running, perhaps moving a lot of stack data around or something? When I look at the output of 'top' with threads displayed, I see the two threads appear, but they only use about 50-60% CPU utilization each. The system monitor shows mostly red (Kernel) as opposed to blue (User). Any ideas? -- Andrew Chapman Senior Technical Director - Framestore CFC
On Thu, 07 Oct 2004 18:14:53 +0100, Andrew Chapman > The thread functor objects are being passed a lot of references (about
20) to some very large std::vectors. One thread always reads/writes from the start of the vectors to their midpoint, and the second thread from (midpoint+1) to the end of the vectors.
Perhaps you could provide some code? Are the threads using any sort of synchronization between themselves? Perhaps you're spending a lot of time contending on a mutex if so. If they're just operating on some shared resources (the vectors) with no locking, they should certainly run faster than the single-threaded case on a dual CPU machine.
The only thing I can think of is that the threads are taking a long time to fork and get running, perhaps moving a lot of stack data around or something?
I'd be suprised if the thread creation took any significant part of your 45-second runtime. -- Caleb Epstein caleb.epstein@gmail.com
Caleb Epstein wrote:
On Thu, 07 Oct 2004 18:14:53 +0100, Andrew Chapman > The thread functor objects are being passed a lot of references (about
20) to some very large std::vectors. One thread always reads/writes from the start of the vectors to their midpoint, and the second thread from (midpoint+1) to the end of the vectors.
Perhaps you could provide some code?
// resize all the furXXX std::vectors furGen.allocate(...); boost::thread_group threads; CalcCurvesThread thread1Obj(furGen, 0, midPoint, furCurves, furRootUVs, furRootPositions, furRootNormals, furRootDu, furRootDv, furRootDpDuv, guideCurves, guideLookupTable, furDesc, 1.0); CalcCurvesThread thread2Obj(furGen, midPoint+1, nCurves-1, furCurves, furRootUVs, furRootPositions, furRootNormals, furRootDu, furRootDv, furRootDpDuv, guideCurves, guideLookupTable, furDesc, 1.0); threads.create_thread(thread1obj); threads.create_thread(thread2obj); threads.join_all();
Are the threads using any sort of synchronization between themselves? Perhaps you're spending a lot of time contending on a mutex if so. If they're just operating on some shared resources (the vectors) with no locking, they should certainly run faster than the single-threaded case on a dual CPU machine.
No, there is no need for any synchronization, as far as I can tell. Each thread is only reading from and writing to different areas of the pre-allocated vectors (thread1 to the first half of the vectors, thread2 to the second half). It was my assumption that the new child threads are sharing all the data with the host process. However, after watching the threads running so slowly, and it appearing as through they're sleep/wait'ing on system resources, I was wondering if somehow all the stack data in my large data vectors wasn't being copied to and from the threads. -- Andrew Chapman Senior Technical Director - Framestore CFC
On Mon, Oct 11, 2004 at 12:20:10, Andrew Chapman spake thusly:
Caleb Epstein wrote:
On Thu, 07 Oct 2004 18:14:53 +0100, Andrew Chapman > The thread functor objects are being passed a lot of references (about
20) to some very large std::vectors. One thread always reads/writes from the start of the vectors to their midpoint, and the second thread from (midpoint+1) to the end of the vectors.
Perhaps you could provide some code?
// resize all the furXXX std::vectors furGen.allocate(...);
boost::thread_group threads;
CalcCurvesThread thread1Obj(furGen, 0, midPoint, furCurves, furRootUVs, furRootPositions, furRootNormals, furRootDu, furRootDv, furRootDpDuv, guideCurves, guideLookupTable, furDesc, 1.0);
CalcCurvesThread thread2Obj(furGen, midPoint+1, nCurves-1, furCurves, furRootUVs, furRootPositions, furRootNormals, furRootDu, furRootDv, furRootDpDuv, guideCurves, guideLookupTable, furDesc, 1.0);
threads.create_thread(thread1obj); threads.create_thread(thread2obj);
threads.join_all();
Are the threads using any sort of synchronization between themselves? Perhaps you're spending a lot of time contending on a mutex if so. If they're just operating on some shared resources (the vectors) with no locking, they should certainly run faster than the single-threaded case on a dual CPU machine.
No, there is no need for any synchronization, as far as I can tell. Each thread is only reading from and writing to different areas of the pre-allocated vectors (thread1 to the first half of the vectors, thread2 to the second half).
It was my assumption that the new child threads are sharing all the data with the host process. However, after watching the threads running so slowly, and it appearing as through they're sleep/wait'ing on system resources, I was wondering if somehow all the stack data in my large data vectors wasn't being copied to and from the threads.
I would bet that there are hidden synchronization costs in the code you are interested in. Can't tell by looking at your code, though. Possibilities: - std::string usage. Are you copying strings in the thread functor, from the vector or elsewhere? If so, you the COW used for std::strings involves a lock. - smart pointer usage (especially copying/assignment) that involves locking, e.g. boost::shared_ptr does this depending on build settings - allocation. Lots of little allocations in multi-threaded code can be a source of serialization (heap is locked for allocations). You might see this with explicit allocations or with objects that use the free store if the scope of those objects is small. An example is std::string (in addition to the other problem) - if you used one as a buffer in your functor code, you could be thrashing the heap. All of this is guesswork, you might want to construct the minimal amount of compilable-code necessary to reproduce the problem and post it. Scott
participants (3)
-
Andrew Chapman
-
Caleb Epstein
-
Thomas S. Urban