Hello, My name is Dan Lincan. I'm a 4th year undergratuate student in Computer Science at the "Politehnica" University of Bucharest and I'm interested in the project "Boost.Thread/ThreadPool". I have studied the documents provided as resouces and I would really appreciate if you helped me better understand this project. Are there any tasks that are meant to get solved for the proposal? How do I start? I am aware that it will not be an easy task at all, especially because I don't have much experience with boost, but I'm willing to learn and work as much as it is needed to get it done. Thank you for your time, Dan Lincan
Le 10/04/13 17:02, Dan Lincan a écrit :
Hello,
My name is Dan Lincan. I'm a 4th year undergratuate student in Computer Science at the "Politehnica" University of Bucharest and I'm interested in the project "Boost.Thread/ThreadPool".
I have studied the documents provided as resouces and I would really appreciate if you helped me better understand this project. Are there any tasks that are meant to get solved for the proposal? How do I start?
I am aware that it will not be an easy task at all, especially because I don't have much experience with boost, but I'm willing to learn and work as much as it is needed to get it done.
Hi, The project Boost.ThreadPool was developed taking in account the old Boost.Thread interface when the future library was not accepted in Boost yet. As far as I know the author (O. Kowalke) is working on an alternative design based on Boost.Context/Fibers instead of threads. IMO both approaches have a use depending on the application context. Two kinds of thread pools could be provided: * one simple and * another more sophisticated based on work stealing. The scheduled tasks could be non-blocking or blocking on the completion of other tasks. The goal of the project is to use the existing implementation as base and provide an interface that is compatible with the new Boost.Thread interface (based on the C++11 standard), and refactor it to avoid duplications (e.g. make use of a generic concurrent queue, ...) Once a Thread pool with these characteristics will be available we can start to adapt the boost::async and boost::future::then functions taking a scheduler as parameter. While this seems a big project, there is already a lot of work there and the provided services can be done proving first 1st simple thread pool scheduling non-blocking tasks 2nd add work stealing 3rd add blocking tasks HTH, Vicente
Le 12/04/13 00:30, Vicente J. Botet Escriba a écrit :
Le 10/04/13 17:02, Dan Lincan a écrit :
Hello,
My name is Dan Lincan. I'm a 4th year undergratuate student in Computer Science at the "Politehnica" University of Bucharest and I'm interested in the project "Boost.Thread/ThreadPool".
I have studied the documents provided as resouces and I would really appreciate if you helped me better understand this project. Are there any tasks that are meant to get solved for the proposal? How do I start?
I am aware that it will not be an easy task at all, especially because I don't have much experience with boost, but I'm willing to learn and work as much as it is needed to get it done.
Hi,
The project Boost.ThreadPool was developed taking in account the old Boost.Thread interface when the future library was not accepted in Boost yet. As far as I know the author (O. Kowalke) is working on an alternative design based on Boost.Context/Fibers instead of threads. IMO both approaches have a use depending on the application context.
Two kinds of thread pools could be provided: * one simple and * another more sophisticated based on work stealing.
The scheduled tasks could be non-blocking or blocking on the completion of other tasks.
The goal of the project is to use the existing implementation as base and provide an interface that is compatible with the new Boost.Thread interface (based on the C++11 standard), and refactor it to avoid duplications (e.g. make use of a generic concurrent queue, ...)
Once a Thread pool with these characteristics will be available we can start to adapt the boost::async and boost::future::then functions taking a scheduler as parameter.
While this seems a big project, there is already a lot of work there and the provided services can be done proving first 1st simple thread pool scheduling non-blocking tasks 2nd add work stealing 3rd add blocking tasks
Ah I forget. One of the point to take in account in the interface is move semantics (use of Boost.Move). Vicente
Hello Vicente,
2013/4/12 Vicente J. Botet Escriba
The project Boost.ThreadPool was developed taking in account the old Boost.Thread interface when the future library was not accepted in Boost yet. As far as I know the author (O. Kowalke) is working on an alternative design based on Boost.Context/Fibers instead of threads. IMO both approaches have a use depending on the application context.
your are referring to boost.task (former name was boost.threadpool)? yes I still working on it - as you know I was asked to move some of the functionality into separate libraries (-> boost.context, boost.fiber), which I finish soon. Boost.task already contains a threadpool while each worker-thread schedules fibers using boost.fiber in order to handle blocking tasks etc. Oliver
Hello, I would like to know if I'm going in the right direction. Threadpool ideas: 1. simple * fixed number of threads ( possibly bound to number of cpus/cores ) * add_task 2. simple + scheduling * variable number of threads - specify boundaries(min/max) - a lightweight algorithm in place to determine when to add / remove new threads withing boundaries to increase the throughoutput * add_task * add_task_after(time_point) - the task will be scheduled only after time_point has passed - 2 possibilities ( relative time, absolute time ) 3. complex = simple + scheduling + work-stealing * variable number of threads * add_task * add_task_after(time_point) * work-stealing All of them will use the chrono library for time mesurement. For the proposal I have to point out where changes should be made in the current implementation at [1] or I need to provide a prototype of a new interface? [1] https://svn.boost.org/svn/boost/sandbox/async/libs/tp/doc/html/index.html Thank you, Dan
Hello,
I would like to know if I'm going in the right direction.
Threadpool ideas: 1. simple * fixed number of threads ( possibly bound to number of cpus/cores ) * add_task I don't like add_task. Maybe better use async, launch or submit. What would be the result?
2. simple + scheduling * variable number of threads - specify boundaries(min/max) - a lightweight algorithm in place to determine when to add / remove new threads withing boundaries to increase the throughoutput What would be the advantage of having a dynamic number of threads? What do you mean by scheduler? * add_task * add_task_after(time_point) - the task will be scheduled only after time_point has passed - 2 possibilities ( relative time, absolute time ) It would be better to distinguish between launch after a duration and launch at a time point. Why these time related functions are not on the simple thread pool?
3. complex = simple + scheduling + work-stealing * variable number of threads * add_task * add_task_after(time_point) * work-stealing
All of them will use the chrono library for time mesurement.
For the proposal I have to point out where changes should be made in the current implementation at [1] or I need to provide a prototype of a new interface? I would like to see a new interface on the proposal. [1] could be used to get insight on the domain. There is a on going c++1y proposal [2] that could help you. I would
Le 21/04/13 14:12, Dan Lincan a écrit : prefer however that the interface don't use dynamic polymorphism (inheritance) but a static one (concept). We can always adapt a dynamic polymorphic interface on top of the static one. The book CCiA [2] contains a lot of useful information related to the project. A mandatory book to read for the project. In particular there are implementation of static polymorphic thread-pools including work-stealing. Best, Vicente
[1] https://svn.boost.org/svn/boost/sandbox/async/libs/tp/doc/html/index.html
[2] Executors and schedulers, revision 1 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3562.pdf [3] C++ Concurrency in Action by A. Williams.
On Sun, Apr 21, 2013 at 4:47 PM, Vicente J. Botet Escriba
Le 21/04/13 14:12, Dan Lincan a écrit :
Hello,
I would like to know if I'm going in the right direction.
Threadpool ideas: 1. simple * fixed number of threads ( possibly bound to number of cpus/cores ) * add_task
I don't like add_task. Maybe better use async, launch or submit. What would be the result?
The task/job would be added in the threadpool queue and, if there is a free thread, it would be runned.
2. simple + scheduling * variable number of threads - specify boundaries(min/max) - a lightweight algorithm in place to determine when to add / remove new threads withing boundaries to increase the throughoutput
What would be the advantage of having a dynamic number of threads?
Some of the submitted jobs might be blocking and it would take a lot of time for the jobs in queue to get completed. Increasing the number of threads would help in this case because it would speed up the completion of jobs. A limitation has to be in place though to avoid context switching overhead. Also, if there are a lot of idle threads for a long time, the number could be decreased to clear up resources.
What do you mean by scheduler?
The scheduler will determine when the algorithm to adjust the number of threads that are running. This could be every time a thread is looking for work if the algorithm is light ( O(1) ) .
* add_task * add_task_after(time_point) - the task will be scheduled only after time_point has passed - 2 possibilities ( relative time, absolute time )
It would be better to distinguish between launch after a duration and launch at a time point.
Then two functions should be provided: * add_task_after(abs_time) - fixed point in time * add_task_after(rel_time) - threadpool_start_time + rel_time
Why these time related functions are not on the simple thread pool?
It would introduce overhead which might not be O(1) depending on the algorithm used. Every time a thread would request work/job, checks would be in place to see if there is a job in the queue that can be executed at this point in time. This is the main reason I have separated them. 1. has zero overhead.
3. complex = simple + scheduling + work-stealing * variable number of threads * add_task * add_task_after(time_point) * work-stealing
All of them will use the chrono library for time mesurement.
For the proposal I have to point out where changes should be made in the current implementation at [1] or I need to provide a prototype of a new interface?
I would like to see a new interface on the proposal. [1] could be used to get insight on the domain. There is a on going c++1y proposal [2] that could help you. I would prefer however that the interface don't use dynamic polymorphism (inheritance) but a static one (concept). We can always adapt a dynamic polymorphic interface on top of the static one. The book CCiA [2] contains a lot of useful information related to the project. A mandatory book to read for the project. In particular there are implementation of static polymorphic thread-pools including work-stealing.
Great! I have started reading it. Regards, Dan
[1] https://svn.boost.org/svn/boost/sandbox/async/libs/tp/doc/html/index.html
[2] Executors and schedulers, revision 1 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3562.pdf [3] C++ Concurrency in Action by A. Williams.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Le 21/04/13 18:19, Dan Lincan a écrit : > On Sun, Apr 21, 2013 at 4:47 PM, Vicente J. Botet Escriba >wrote: >> Le 21/04/13 14:12, Dan Lincan a écrit : >> >>> Hello, >>> >>> I would like to know if I'm going in the right direction. >>> >>> Threadpool ideas: >>> 1. simple >>> * fixed number of threads ( possibly bound to number of cpus/cores ) >>> * add_task >> I don't like add_task. Maybe better use async, launch or submit. >> What would be the result? >> > The task/job would be added in the threadpool queue and, if there is a > free thread, it would be runned. IMHO, this is an implementation detail. What the user wants is to launch a function asynchronously. Anyway naming is alway a source of conflict. This should be discussed specificaly on the Boost ML during the project. > >>> 2. simple + scheduling >>> * variable number of threads >>> - specify boundaries(min/max) >>> - a lightweight algorithm in place to determine when to add / >>> remove new threads withing boundaries to increase the throughoutput >> What would be the advantage of having a dynamic number of threads? > Some of the submitted jobs might be blocking Humm, I don't think it is a good idea the submitted jobs could block the worker thread. I think this could be the source of deadlock, but of course experimenting will always help us to learn more things. Maybe the proposal should include thread pools for blocking and non blocking tasks. > and it would take a lot > of time for the jobs in queue to get completed. Increasing the number > of threads > would help in this case because it would speed up the completion of > jobs. A limitation has to be in place though to avoid context > switching overhead. > > Also, if there are a lot of idle threads for a long time, the number > could be decreased to clear up resources. I need to see some valid examples for which there is a need to block the worker thread. If you can include some on the proposal this will help to understand the motivation. >> What do you mean by scheduler? > The scheduler will determine when the algorithm to adjust the number > of threads that are running. This could be every time a thread is > looking for work if the algorithm is light ( O(1) ) . > >>> * add_task >>> * add_task_after(time_point) >>> - the task will be scheduled only after time_point has passed >>> - 2 possibilities ( relative time, absolute time ) >> It would be better to distinguish between launch after a duration and launch >> at a time point. > Then two functions should be provided: > * add_task_after(abs_time) - fixed point in time > * add_task_after(rel_time) - threadpool_start_time + rel_time submit_at, submit_after? >> Why these time related functions are not on the simple thread pool? > It would introduce overhead which might not be O(1) depending on the > algorithm used. > Every time a thread would request work/job, checks would be in place > to see if there is a job in the queue that can be executed at this > point in time. > This is the main reason I have separated them. 1. has zero overhead. I see your point of view. All this depends how these operations are implemented. I need to think more about it. > >>> 3. complex = simple + scheduling + work-stealing >>> * variable number of threads >>> * add_task >>> * add_task_after(time_point) >>> * work-stealing >>> >>> All of them will use the chrono library for time mesurement. >>> >>> For the proposal I have to point out where changes should be made in >>> the current implementation at [1] or I need to provide a prototype of >>> a new interface? >> I would like to see a new interface on the proposal. [1] could be used to >> get insight on the domain. >> There is a on going c++1y proposal [2] that could help you. I would prefer >> however that the interface don't use dynamic polymorphism (inheritance) but >> a static one (concept). We can always adapt a dynamic polymorphic interface >> on top of the static one. The book CCiA [2] contains a lot of useful >> information related to the project. A mandatory book to read for the >> project. In particular there are implementation of static polymorphic >> thread-pools including work-stealing. > Great! I have started reading it. > > Do not forget to send your proposal today. Best, Vicente
Hello,
Do not forget to send your proposal today.
This took me a little by surprise. I thought registration period starts today(22 april) according to [1]. If not where to send the proposal? To this list? [1] http://www.google-melange.com/document/show/gsoc_program/google/gsoc2013/hel... Best Regards, Dan
Le 22/04/13 01:32, Dan Lincan a écrit :
Hello,
Do not forget to send your proposal today. This took me a little by surprise. I thought registration period starts today(22 april) according to [1]. If not where to send the proposal? To this list?
You are right. It starts the 22th, that is today. We wanted to discuss the proposals on this ML so that other people of the Boost community than the mentors could give advice before moving them to the GSoC. Best, Vicente
Hello, You can find my proposal for the thread_pool Project at [1]. Please tell me how I can improve it and where to give more details, further explanations. I have avoided giving implementation details and focused on sketching the interface. [1] http://danlincan.3owl.com/gsoc/Proposal.pdf Regards, Dan
Le 25/04/13 17:42, Dan Lincan a écrit :
Hello,
You can find my proposal for the thread_pool Project at [1]. Please tell me how I can improve it and where to give more details, further explanations. I have avoided giving implementation details and focused on sketching the interface.
[1] http://danlincan.3owl.com/gsoc/Proposal.pdf
Hi,
please apply to GSoc with your proposal as soon as possible. You will have there other suggestions on how to improve it. I will come back later. Best, Vicente
Le 25/04/13 17:42, Dan Lincan a écrit :
Hello,
You can find my proposal for the thread_pool Project at [1]. Please tell me how I can improve it and where to give more details, further explanations. I have avoided giving implementation details and focused on sketching the interface.
I have some questions: * Why the submit function of all the thread pools doesn't returns the same? * What is the advantage of returning future from submit call? * Would the destructor of the future resulting of the submit call wait until the task has been finished? * What about having a specific time based pool that will submit the function to another pool once the duration/time_point has elapsed/reached? Or specific free functions submit_after/submit_at that use a hidden thread/queue to manage with the time constraint? * I would provide a submit function that has as parameter the function to call and its parameters, as std::async, std::thread::thread, or std::packaged_task provide, so that the user is not forced to use bind. * I would move the time argument as the fist one of the time based functions so that the preceding point can be made possible for these functions also. * For a work-stealing thread pool the user would need a function to force the scheduling of new jobs when it needs to wait for some jobs to finish. * I don't see nothing about cancellation of submitted functions. Could you comment on this? * From the interface all the pools are non-blocking, that is the queue are not bounded. Have you some thought about thread pool that have bounded queues and that could block or tell the user that the queues are congested, ... * Quite frequent we need to submit jobs that need to be handled in a sequential order, what do you propose for this use case? * In addition to submitting a job after/at a given duration/time_point have been elapsed/reached, we often need to submit a job that needs the result of another job. How a user would be able to do it. Would the library help her/him? * It would be great to reference existing libraries/proposals and how your proposal solves limitations you can find in the referenced libraries. * How these thread pool can be used with un updated async function that use thread pools? Best, Vicente
Hello, I have tried to answer most of the questions.
* Why the submit function of all the thread pools doesn't returns the same?
It's omitted in 1. thead_pool for increased performance.
* What is the advantage of returning future from submit call?
Users can wait for the function call to finish and retrieve the result.
* Would the destructor of the future resulting of the submit call wait until the task has been finished?
After looking at [1], it wouldn't.
* What about having a specific time based pool that will submit the function to another pool once the duration/time_point has elapsed/reached?
It can work but the problem is how do I know how many threads to give to the second pool?
Or specific free functions submit_after/submit_at that use a hidden thread/queue to manage with the time constraint?
Separate thread for there two functions seems like a solution. Two priority queues could be used to store there functions/tasks. To determine when to move the function/task to the global queue of the threadpool some timers could be used. When new task/function is submitted the priority queue would update and, if needed, the timer too ( if the submitted task is the 1st to be executed ).
* I would provide a submit function that has as parameter the function to call and its parameters, as std::async, std::thread::thread, or std::packaged_task provide, so that the user is not forced to use bind.
Will do.
* For a work-stealing thread pool the user would need a function to force the scheduling of new jobs when it needs to wait for some jobs to finish.
Can you explain, please?
* From the interface all the pools are non-blocking, that is the queue are not bounded. Have you some thought about thread pool that have bounded queues and that could block or tell the user that the queues are congested,
A new function could be added so the user can check if the queue is full.
* Quite frequent we need to submit jobs that need to be handled in a sequential order, what do you propose for this use case?
This is a similar case to the submit_at and submit_after functions. Insead of time we'd introduce priority.
* In addition to submitting a job after/at a given duration/time_point have been elapsed/reached, we often need to submit a job that needs the result of another job. How a user would be able to do it. Would the library help her/him?
Not in the form it is now.
* It would be great to reference existing libraries/proposals and how your proposal solves limitations you can find in the referenced libraries.
The problem is that I cannot give solutions that solves all the problems of multiple other libraries without performance costs. [1] https://svn.boost.org/svn/boost/trunk/boost/thread/future.hpp Thank you, Dan
Hello,
I have tried to answer most of the questions.
* Why the submit function of all the thread pools doesn't returns the same? It's omitted in 1. thead_pool for increased performance.
* What is the advantage of returning future from submit call? Users can wait for the function call to finish and retrieve the result. It would be better to split the responsibilities. ThreadPool take care to void(functions) and async (or a free submit function ) take care of functions returning a value.
* Would the destructor of the future resulting of the submit call wait until the task has been finished? After looking at [1], it wouldn't. The future returned by boost::async blocks on destructor.
* What about having a specific time based pool that will submit the function to another pool once the duration/time_point has elapsed/reached? It can work but the problem is how do I know how many threads to give to the second pool? I don't understand. The second pool will be created before the time
Le 26/04/13 01:01, Dan Lincan a écrit : based one. The time based one has the first as parameter. thread_pool tp(4); time_based_thread_pool tbtp(tp); tbtp.submit_at(t, f); // will submit the function 'f' to the pool 'tp' at time_point 't' tbtp.submit(t, f); // will submit the function 'f' to the pool 'tp' inmediately
Or specific free functions submit_after/submit_at that use a hidden thread/queue to manage with the time constraint? Separate thread for there two functions seems like a solution. Two priority queues could be used to store there functions/tasks. To determine when to move the function/task to the global queue of the threadpool some timers could be used. When new task/function is submitted the priority queue would update and, if needed, the timer too ( if the submitted task is the 1st to be executed ).
* I would provide a submit function that has as parameter the function to call and its parameters, as std::async, std::thread::thread, or std::packaged_task provide, so that the user is not forced to use bind. Will do.
* For a work-stealing thread pool the user would need a function to force the scheduling of new jobs when it needs to wait for some jobs to finish. Can you explain, please?
I don't see how a task could block on another one otherwise. I want to be able to do something like auto t1 = submit(tp, f1); auto t2 = submit(tp, f2); // wait until t1 and t2 have been set tp.reschedule_until([] t1.valid() && t2.valid()));
* From the interface all the pools are non-blocking, that is the queue are not bounded. Have you some thought about thread pool that have bounded queues and that could block or tell the user that the queues are congested, A new function could be added so the user can check if the queue is full.
But this state will be spurious.
* Quite frequent we need to submit jobs that need to be handled in a sequential order, what do you propose for this use case? This is a similar case to the submit_at and submit_after functions. Insead of time we'd introduce priority.
I was thinking much more on a degraded thread_pool with only a worker thread, as a serial_executor (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3562.pdf.
* In addition to submitting a job after/at a given duration/time_point have been elapsed/reached, we often need to submit a job that needs the result of another job. How a user would be able to do it. Would the library help her/him? Not in the form it is now. If in addition to the thread_pools, for which submit doesn't returns nothing you provide a submit function that takes a thread pool and a function to submit that returns a future
future<> submit(thread_pool, function); then the user could use the future continuation interface future::then submit(tp, f).then(g); Please take a look athttp://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3558.pdf for the idea.
* It would be great to reference existing libraries/proposals and how your proposal solves limitations you can find in the referenced libraries. The problem is that I cannot give solutions that solves all the problems of multiple other libraries without performance costs.
As for example? Ah, I forgotten to mention job cancellation. How would you provide it? Best, Vicente
participants (3)
-
Dan Lincan
-
Oliver Kowalke
-
Vicente J. Botet Escriba