[COUNTERTREE + SUBALLOCATOR ] version 2.0

Hi this message is to announce the version 2.0 of the *[countertree + suballocator]* library. 1 year ago I presented the request for the evaluation of the [countertree+ suballocator] version 1.0. The approval process is stopped due to the absence of review managers, and in the meanwhile I continued with the develop of this version 2.0. Actually the most important libraries and environments for parallel development, *don't provide parallel algorithms for th**e** data structures based on trees* ( set , multiset, map and multimap), and all the operations with these data structures are done with 1 thread. This is due to the difficult to divide the element of the tree between an arbitrary number of threads. (If you want to access to the element in the position 1000, you must go sequentially forward from the position 0 or sequentially backward from the last position. The [countertree + suballocator] library provide data structures set, multiset,map and multimap compatibles with the STL data structures *plus the access to the elements by their position like in a vector, and random access iterators*, You can manage it like a vector. Many parallel algorithms designed for vectors, can be used without changes with the countertree data structures. This make *easy **the **design **of **parallel algorithms* for these data structures. In the library there are a concurrent and a non concurrent versions of all the data structures. The non concurrent version are designed for to be executed inside 1 thread, or for to be accessed by several threads only in read operations (like the STL data structures), they provide a bit more speed than the concurrent version. The concurrent version is *thread safe with all the operations*, one thread can be read and other writing over the same data structure in a safe mode ( as you can see in 2.3 -examples) The concurrent library provide the full set of functions of the STL , plus an *additional set of functions*. These functions have the mission of resolve the problems appeared in the concurrent development. By example, you have a map, find an iterator from a key and with the iterator modify the element, but in the middle, other thread delete the element, and when try to access with the iterator, the program crash, and it is very difficult to reproduce the error for to debug. These additional functions make a pack with the two things, first with a read lock which permit concurrent reads, and when obtain the iterator with an exclusive lock for to modify. These additional functions *prevent the problems* like the previously described, with an *easy interface* for to understand and use for the final user. I don't know if exist any other library with tree based structures with random access iterators. I had search for and didn't find. If someone knows someone, please, say me, I am interested to know something new and collect good ideas. I suppose it don't exist because i didn't see parallel support for data structures based on trees, in any library or parallel tools. I request help to the Boost community because I am in the *moment “perhaps” *. Perhaps I forgot something, perhaps something is wrong, perhaps someone decision was not correct, perhaps it can be done in other way... *Please*, is someone see any fail, any omission, any alternative or simply any comment, please say me, in order to reach the final objective, *provide to the C++ community a good, fast, robust and reliable tool.* In my opinion, this library or other with similar functionality, is the last big part needed for to begin to think about a *concurrent STL* . You can find concurrent libraries with vectors and arrays. You have too, concurrent libraries with containers based on hash functions. With the countertree library you have concurrent set, multiset,map and multimap. The concurrent linked list can be implemented with the concurrent vector_tree ( other data structure of the library). The concurrent STL is a very big project, and must involve many agents and perspectives. From the users associations, to big computers makers and big data providers, passing by the compiler makers, the parallel tools providers... But I am sure the *Boost community* will play a central role by prestige, knowledge and experience designing libraries You can find the documentation and the code in the GitHub https://github.com/fjtapia/countertree_2.0 or you can see online in my dropbox page : https://dl.dropboxusercontent.com/u/8437476/works/countertree_2.0/index.htmlhttps://dl.dropboxusercontent.com/u/8437476/works/countertree_2.0/doc/index.... If you prefer download a file with all the code and the documentation : https://github.com/fjtapia/countertree_2.0/blob/master/doc/countertree_2.0.z... Sincerely yours Francisco Tapia fjtapia@gmail.com ** I you don't try the library, perhaps the most interested parts of the documentation is the 1.5.- Next improvements. Where explain the future parallel algorithms for the library and how they can improve others areas of the computation with the augmented trees.*

Francisco José Tapia
[snip] The concurrent version is *thread safe with all the operations*, one thread can be read and other writing over the same data structure in a safe mode ( as you can see in 2.3 -examples)
I haven't looked too carefully at the library, but certainly a tree that maintains rank information is a useful data structure. In principle it seems it would be ideal to make it as generic as possible, i.e. built on top of a generic binary search tree that supports maintaining an arbitrary aggregate statistic, i.e. with a user-specified type and user-specified update function. Your countertree, which would provide random access iterators, etc. could then be a simple wrapper over the more generic tree library. It also seems it would be ideal if your library could interface with the Boost Multi Index Containers library, as that would maximize usage options, but I don't know how feasible that is. Another comment is that my own experience is that more often than not, in a multithreaded program, synchronization is needed for more than just maintaining the invariants of a particular data structure, like your countertree, and therefore it generally makes sense for any necessary locking to be handled by the application rather than by any particular data structure library. That is, the set of operations that need to appear atomic will depend on the application, and may likely involve things outside the control of your single data structure. Lock-free data structures are the exception, where you actually gain something by integrating the syncronization with the data structure.

Hi Jeremy
Thanks by your interest in the library
About the interface with Multi Index, I think it's a good idea, but I think
first we must build a robust library with all the functionality, and to be
checked by the Boost users, and after export ideas and code to others
libraries. I insist in the checking of the users for don't export bugs and
problems.
I think Multi Index and countertree can share many useful information, with
the additional advantage than the author of Multi Index , and I are living
in the same city, and comment face to face taking a coffee, is more
pleasant.
About the synchronization, my idea is to do a lock-free data structure from
the user view-point. The locks are automatic, and don't be managed by the
users. The users don' need lock, only must take care about what they want
to do. But for a safe code they must use the new functions. As say in the
message
*“The concurrent library provide the full set of functions of the STL ,
plus an additional set of functions. These functions have the mission of
resolve the problems appeared in the concurrent development. By example,
you have a map, find an iterator from a key and with the iterator modify
the element, but in the middle, other thread delete the element, and when
try to access with the iterator, the program crash, and it is very
difficult to reproduce the error for to debug. These additional functions
make a pack with the two things, first with a read lock which permit
concurrent reads, and when obtain the iterator with an exclusive lock for
to modify.*
*These additional functions prevent the problems like the previously
described, with an easy interface for to understand and use for the final
user.”*
This design permit to the user think the problem as if you have isolated
threads, because the relation between the threads through the data
structure is safe.
If you have any other question, please, ask me, and I will try to respond.
The questions are an excellent source of ideas.
Sincely yours
Francisco Tapia
2013/8/27 Jeremy Maitin-Shepard
Francisco José Tapia
writes: [snip] The concurrent version is *thread safe with all the operations*, one thread can be read and other writing over the same data structure in a safe mode ( as you can see in 2.3 -examples)
I haven't looked too carefully at the library, but certainly a tree that maintains rank information is a useful data structure. In principle it seems it would be ideal to make it as generic as possible, i.e. built on top of a generic binary search tree that supports maintaining an arbitrary aggregate statistic, i.e. with a user-specified type and user-specified update function. Your countertree, which would provide random access iterators, etc. could then be a simple wrapper over the more generic tree library.
It also seems it would be ideal if your library could interface with the Boost Multi Index Containers library, as that would maximize usage options, but I don't know how feasible that is.
Another comment is that my own experience is that more often than not, in a multithreaded program, synchronization is needed for more than just maintaining the invariants of a particular data structure, like your countertree, and therefore it generally makes sense for any necessary locking to be handled by the application rather than by any particular data structure library. That is, the set of operations that need to appear atomic will depend on the application, and may likely involve things outside the control of your single data structure. Lock-free data structures are the exception, where you actually gain something by integrating the syncronization with the data structure.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Francisco José Tapia
Hi Jeremy
Thanks by your interest in the library
About the interface with Multi Index, I think it's a good idea, but I think first we must build a robust library with all the functionality, and to be checked by the Boost users, and after export ideas and code to others libraries. I insist in the checking of the users for don't export bugs and problems.
It is reasonable to not want to deal with the added complexity of Multi Index initially, though if you decide to add it later it seems you would either have to maintain two copies of the code, or make the non Multi Index version just a wrapper over Multi Index.
About the synchronization, my idea is to do a lock-free data structure from the user view-point. The locks are automatic, and don't be managed by the users. The users don' need lock, only must take care about what they want to do. But for a safe code they must use the new functions.
The basic issue is that if you want the user to not have to worry about locking, then you must perfectly anticipate precisely which operations the user needs to appear atomic. You have added various predicate-enabled operations, but there is no guarantee that one of those happens to fit what the user needs. For example: - the user may want to perform some combined insert/erase operation and have it appear atomic to other threads; - any use of iterator traversal will require manual locking by the user; - the user may want to atomically combine modifications to the data structure with modifications to the values in the data structure; - the user may need to atomically modify multiple data structures.
From my perspective, forcing users writing multithreaded code to think carefully about synchronization/locking is a good thing. It is true that
x.insert_if(..., [&]{ // check some stuff here }); is shorter than { std::lock_guardstd::mutex guard(my_mutex); if (/* check some stuff here */) { x.insert(...); } } but it isn't that much shorter, and the second version is more straightforward and is much more general. Perhaps others agree with you though that having a convenience interface for users, even if it is limited in capability, is still useful. One small comment regarding your interface of predicate-enabled functions is to pass functors using a template type, like the standard library and other boost libraries generally do, rather than using std::function, as std::function introduces extra overhead.

On Aug 28, 2013, at 5:06 PM, Jeremy Maitin-Shepard
Francisco José Tapia
writes: About the interface with Multi Index, I think it's a good idea, but I think first we must build a robust library with all the functionality, and to be checked by the Boost users, and after export ideas and code to others libraries. I insist in the checking of the users for don't export bugs and problems.
It is reasonable to not want to deal with the added complexity of Multi Index initially, though if you decide to add it later it seems you would either have to maintain two copies of the code, or make the non Multi Index version just a wrapper over Multi Index.
At the very least, it is wise to see whether the code should be refactored, to simplify Multi-Index integration, before too much more work is done that might be harder to refactor later.
About the synchronization, my idea is to do a lock-free data structure from the user view-point. The locks are automatic, and don't be managed by the users. The users don' need lock, only must take care about what they want to do. But for a safe code they must use the new functions.
The basic issue is that if you want the user to not have to worry about locking, then you must perfectly anticipate precisely which operations the user needs to appear atomic. You have added various predicate-enabled operations, but there is no guarantee that one of those happens to fit what the user needs.
Absolutely. The special operations provided don't cover every use case and never can without creating an enormous, hard to grok library. Leave the creation of ad hoc, thread-safe, meta-operations to the users.
From my perspective, forcing users writing multithreaded code to think carefully about synchronization/locking is a good thing.
+1
It is true that
x.insert_if(..., [&]{ // check some stuff here });
is shorter than
{ std::lock_guardstd::mutex guard(my_mutex); if (/* check some stuff here */) { x.insert(...); } }
but it isn't that much shorter, and the second version is more straightforward and is much more general.
+1 ___ Rob (Sent from my portable computation engine)

Obviously, I can't pretend cover all possible cases with the interface functions, but a great percent of them yes. Mi idea is to try to resolve in a easy way, the majority of the possible cases, because many programmers don't know in deep the lock functions and need to resolve simple problems with concurrent data structures. The idea of the function as parameter is for to do something. This can be so complex as you need. Inside a function you can call other function of the same data structure, because the internal mutex used for to lock is recursive. If this is not sufficient, you can use the lock data of the data structure directly because is public. In all the data structures is the variable BD. The description and functions of the locks is in the file barrier.hpp in countertree folder of the code. All the data structures have typedef in order to simplify the names typedef typename config_barrier<cnc>::barrier_data barrier_data ; typedef typename config_barrier<cnc>::barrier_read barrier_read ; typedef typename config_barrier<cnc>::barrier_modify barrier_modify ; you have a cntree::set<int> S1; your_function( ......) { cntree::set<int>::barrier_modify BM ( S1.BD); . BM.wait_no_readers(); . }; but if your lock concern several data structures you must create your own lock data structures and manage them. About the overhead of the std::functions I didn't know, but I suspect because I didn't see it in the functions of the standard library. I will change soon. Thanks Francisco Tapia

I just modified the code of the class, changing the format of the functions used as parameters from the format of std::function to the new as template parameter, in order to eliminate the overhead of the call. ( Thanks to Rob Steward and Jeremy Martin-Shepard by their comments and suggestions) I changed too the documentation about the classes of the library, and specially the new functions and the lock system and how to use explicitly in the functions of the users. I hope it will be more clear now
participants (3)
-
Francisco José Tapia
-
Jeremy Maitin-Shepard
-
Rob Stewart