
2013/4/25 Hardy
On 2013/4/23 20:59, Antony Polukhin wrote:
2013/4/23 Hardy
: "A good addition to proposal will be some sketch of trie interface: methods and constructors that Trie class will contain." Could you give me some examples or more advices on that? What I think out now are something like STL set, map, and with some specific methods and variables of Trie.
You are right, it is better to be as close to STL interface as possible. You may also say something like:
" Trie interface will have all the typedefs and methods of std::set excluding: iterator insert (const_iterator position, const value_type& val); ... It will additionally have the following methods:
pair
prefix_range (const value_type& prefix) const; - returns range of elements starting from `prefix`. ... " You may also add some notes that you think will be interesting or just affect design, like "trie implementation will be storing it's size as a separate field to allow getting size in constant time" or "won't be storing it's size leading to liner complexity of size() function"... Some more examples of containers features: http://www.boost.org/doc/libs/1_53_0/doc/html/container/other_features.html
-- Best regards, Antony Polukhin
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Hi, the following is what I conceive of Trie, which I will put to the 'description of work' in my proposal, because of its unmaturity, I will be happy to discuss on it with you, please give me some advice on it so that I can finish it and submit my application on the GSOC site. Thank you very much.
==description== Trie(Radix Tree) is an efficient in memory data structure to deal with string storing and searching. It can be used in many applications, and in comparison to hashtables, it has better expandability and stability. It is a data structure of great importance and wide usage. C++ has already had powerful common key-value containers such as set and map(I will only use map to make the comparison later), but when the keys are strings, map will take O(logN * length(string)) to lookup a key which has a worse performance when handling long strings and large amount of data, and stores every key as a full string which also wastes space. Besides, it is hard to use map traverse some keys with common prefix. Trie is good at handling strings especially the above tasks, so it is needed to improve the performance of handling key-value storage with the type of key being string.
The basic interface of Trie will have similar typedefs and methods to stl::map, with some differences:
pair
prefix_range (const key_type& prefix) const; - returns range of elements starting from ‘prefix’ void prefix_erase(const key_type& prefix) const; - erases elements with keys starting with ‘prefix’ void prefix_count(const key_type& prefix) const; - returns the number of elements with keys starting with ‘prefix’
Pretty good!
key_type get_key(const_iterator position); - returns the key of the iterator on position, because I think it is a waste of space to store the complete key, so users cannot get the key directly by iterator. vector
get_keys(const Trie &trie); - returns the keys of all elements in Trie vector get_keys(const key_type& prefix); - returns the keys of all elements in starting with ‘prefix’ vector get_keys(const_iterator left, const_iterator right); - returns the keys between iterator left and right.
Did not get the idea. Dereferencing an iterator what shall return? Do you mean 'key' here as a character of substring or 'key' here is a string itself?
Besides, considering that traditional Trie has limited scale of using, I want to design my Trie in 3 layers: The bottom one can provide the user-definition key type, which uses like RawTrie
, in which the key can be a sequence of some type(maybe an array of anything can be a key), and every node of Trie can be taken as a root of sub RawTrie, which gives convenience to add Trie into it and even swap a sub Trie. And the second layer will be a specific Trie with the key of chars(when handling lookup and insert, both string and char[] can be adapted to it). And the feature of sub Trie is provided. The class is like Trie . And the third layer will be a Trie without so much freedom of operating as the above two. I will just provide the basic interface as I list above. And the last two layers of Trie will have a version of compressed storage of strings.
Relations between layers are not clear. Bay the way, student applications can now be submitted to https://google-melange.appspot.com/gsoc/homepage/google/gsoc2013. The deadline is 3 May. Do not forget to submit your proposal and add link to this discussion in proposal. -- Best regards, Antony Polukhin