On Sun, 9 Feb 2020 at 07:18, degski
On Sat, 8 Feb 2020 at 11:43, Joana Modesto Hrotko via Boost < boost@lists.boost.org> wrote:
Dear Boost community,
I am a Master's student and currently working on my Master thesis. My Master Thesis consists in implementing the k2-tree data structure in C++ and one of the additional proposals for my thesis would be to integrate my code with the boost graph library.I believe this integration would bring great value to the boost library and also it would provide a better validation for my work. So my question is if this would be something of interest for the library to integrate with boost graph library.
INFO ABOUT K2-TREE: The k2-tree have been proved successful to represent in a very compact way different kinds of binary relations, such as web graphs, RDFs or raster data. Binary relations can be an abstraction to represent the relation between the objects of two collections of different nature: graphs, trees, strings, among others. They can be used in several low-level structures within a more complex information retrieval system, or even replace one of the most used one: an inverted index can be regarded as a binary relation between the vocabulary of terms and the documents where they appear. Moreover, it can represent the relation between terms and web pages, or even social networks or the connection between the web pages in the Web, which is normally represented as a web graph. k2-trees have been also used in other scenarios, such as geographical and RDF data, or images.
Brisaboa, Nieves R., et al. "Compressed representation of dynamic binary relations with applications." Information Systems 69 (2017): 106-123. Coimbra, Miguel E., et al. "On dynamic succinct graph representations." arXiv preprint arXiv:1911.03195 (2019).
Great subject for a thesis. Munro, the man of the beap https://github.com/pfalcon/beap (a dynamic implicit data structure https://github.com/pfalcon/awesome-implicit-data-structures). I don't really understand why you would like a dynamic k2-tree, you can just as well have a static one https://github.com/degski/KD-Tree/blob/master/KD-Tree/ikdtree.hpp [Eytzinger-tree, the levels are implicit and the sorting is recursively done, partially ( with std::nth_element, (in-place) sorting around the median (the nth-element) ) along the way, getting more sorted on lower levels, on my laptop I can easily run the recursion, without exhausting stack-space, till my heap space starts to run out (with default stack size settings)] and copy the data on re-balancing (or re-balance the tree on copying), as, in general, one insertion will completely change the tree, due to the alternating dimensions, and the mechanics of updating is out-performed very quickly by the recursive (in place) construction. Having said that, I think there is still some (maybe a lot of) mileage into exploring how these things can/should be well implemented in C++17.
Best of luck with your thesis.
degski -- @realdegski https://brave.com/google-gdpr-workaround/ "We value your privacy, click here!" Sod off! - degski "Anyone who believes that exponential growth can go on forever in a finite world is either a madman or an economist" - Kenneth E. Boulding "Growth for the sake of growth is the ideology of the cancer cell" - Edward P. Abbey
I just realized, one can completely (and simply) hide the copying/re-construction behind a dynamic (vector-like) interface, and the dynamic implicit kd-tree is a fact. This is now on my TODO-list. KD-trees of dimensions 1, 2 and 3 seem to be the norm (at 4 dimensions one runs already in the curse of dimensionality, and linear search does better). degski -- @realdegski https://brave.com/google-gdpr-workaround/ "We value your privacy, click here!" Sod off! - degski "Anyone who believes that exponential growth can go on forever in a finite world is either a madman or an economist" - Kenneth E. Boulding "Growth for the sake of growth is the ideology of the cancer cell" - Edward P. Abbey