[Multi_index] Performance like sequenced.cpp example
Hi,
I have to count a lot of words. Up to now i did it with MySQL, because it
was easy. The result is safed there anyway. Now i thought i could speed up
this a little if i would use internally a Multi_index list to store the
words, so i have only to insert all different words. The words are stored
in a UnicodeString from the ICU library.
My code is really near to the one from the example "sequenced.cpp".
Im using the following definition:
typedef multi_index_container<
UnicodeString,
indexed_by<
sequenced<>,
ordered_non_unique
text_container;
typedef nth_index
On Mon, April 30, 2007 19:47, Manuel Jung wrote:
Hi,
I have to count a lot of words. Up to now i did it with MySQL, because it was easy. The result is safed there anyway. Now i thought i could speed up this a little if i would use internally a Multi_index list to store the words, so i have only to insert all different words. The words are stored in a UnicodeString from the ICU library. My code is really near to the one from the example "sequenced.cpp". Im using the following definition:
typedef multi_index_container< UnicodeString, indexed_by< sequenced<>, ordered_non_unique
text_container;
typedef nth_index
::type ordered_text; text_container tc; Im inserting new words with "tc.push_back(UnicodeString(NewWord));" And count them exactly like in the example. I thought this should be fast, but it isnt. It eats up all my CPU, but isnt fast. It is a lot slower than my old solution. I have still hope i could speed this up, before i have to switch back MySQL. The profile of a run says that "boost::multi_index::safe_mode::check_same_owner<..." eats most CPU time.
Some suggesting how to speed it up with Multi_index? Or some ideas which other way would be faster than MySQL inserts?
Thanks Manu
Try using the hashed_non_unique instead of ordered_non_unique index implementation. This will use hashed values to access keys, and not a comparison function. My personal opinion is that if your words are in the database anyway, you should not retrieve them from there and then store them. SQL Solution will be always faster, since databases knows how to optimize statements and result sets as well. With Kind Regards, Ovanes Markarian
Try using the hashed_non_unique instead of ordered_non_unique index implementation. This will use hashed values to access keys, and not a comparison function.
Would this really work? If i use the hashed_non_unique index, i cant use std::difference and "upper_bound" to get the count of same words, because it isnt sorted anymore? Or am i wrong?
My personal opinion is that if your words are in the database anyway, you should not retrieve them from there and then store them. SQL Solution will be always faster, since databases knows how to optimize statements and result sets as well.
The original data comes not from the Database. I would do it then directly with a User Defined Function or Stored Procedure. But in my application the data is downloaded from the internet and is written to the DB after or before counting words. (Im counting it in the database with "INSERT ON DUPLICATE KEY UPDATE" statements.) Cheers Manu
On Mon, April 30, 2007 20:26, Manuel Jung wrote:
Try using the hashed_non_unique instead of ordered_non_unique index implementation. This will use hashed values to access keys, and not a comparison function.
Would this really work? If i use the hashed_non_unique index, i cant use std::difference and "upper_bound" to get the count of same words, because it isnt sorted anymore? Or am i wrong?
Please take a look at:
http://www.boost.org/libs/multi_index/doc/reference/hash_indices.html#hash_i...
There is a member count (2 overloads), which can count all items with a given key or another
member equal_range (2 overloads), which ruturns the pair
My personal opinion is that if your words are in the database anyway, you should not retrieve them from there and then store them. SQL Solution will be always faster, since databases knows how to optimize statements and result sets as well.
The original data comes not from the Database. I would do it then directly with a User Defined Function or Stored Procedure. But in my application the data is downloaded from the internet and is written to the DB after or before counting words. (Im counting it in the database with "INSERT ON DUPLICATE KEY UPDATE" statements.)
Ok, wanted to be sure. ;)
Cheers Manu
With Kind Regards, Ovanes Markarian
Try using the hashed_non_unique instead of ordered_non_unique index implementation. This will use hashed values to access keys, and not a comparison function.
Would this really work? If i use the hashed_non_unique index, i cant use std::difference and "upper_bound" to get the count of same words, because it isnt sorted anymore? Or am i wrong?
Please take a look at:
http://www.boost.org/libs/multi_index/doc/reference/hash_indices.html#hash_i...
There is a member count (2 overloads), which can count all items with a given key or another member equal_range (2 overloads), which ruturns the pair
for begin and end of the range.
I took a look at it. Thank you very much. I never used a hashed index, but i should sometime. For now, thanks to the quick solution some posts before, i will optimize at another place. But ill come back, if needed! Thank you for your help, Bye Manu
Hello Manuel,
----- Mensaje original -----
De: Manuel Jung
Hi,
I have to count a lot of words. Up to now i did it with MySQL, because it was easy. The result is safed there anyway. Now i thought i could speed up this a little if i would use internally a Multi_index list to store the words, so i have only to insert all different words. The words are storedin a UnicodeString from the ICU library. My code is really near to the one from the example "sequenced.cpp". Im using the following definition:
typedef multi_index_container< UnicodeString, indexed_by< sequenced<>, ordered_non_unique
text_container;
typedef nth_index
::type ordered_text; text_container tc; Im inserting new words with "tc.push_back(UnicodeString(NewWord));" And count them exactly like in the example. I thought this should be fast, but it isnt. It eats up all my CPU, but isnt fast. It is a lot slower than my old solution. I have still hope i could speed this up, before i have to switch back MySQL. The profile of a run says that "boost::multi_index::safe_mode::check_same_owner<..." eats most CPU time.
This trace indicates that you've set Boost.MultiIndex safe mode on; this and its companion invariant-checking mode are huge CPU eaters, only intended for catching programming errors in debug builds. Please turn them off and time again: is the performance adequate now? Joaquín M López Muñoz Telefónica, Investigación y Desarrollo
This trace indicates that you've set Boost.MultiIndex safe mode on; this and its companion invariant-checking mode are huge CPU eaters, only intended for catching programming errors in debug builds. Please turn them off and time again: is the performance adequate now? Yeah, i used the release build viewed the profile. it looks different: Still a MI function on top, but a different >very< often used list. So should
Good evening this be okay. Also MySQL is now the bottleneck. My application eats much less CPU time. Thank you! Greetings Manu
participants (3)
-
"JOAQUIN LOPEZ MU?Z"
-
Manuel Jung
-
Ovanes Markarian