Any interest in a framework for memoization?

newer
Splitting/rearranging Boost.Utility

older
Re: [boost] [serialization] new...

Giacomo Drago

18 Nov 2013 18 Nov '13

8:30 a.m.

Hi, I would like to submit a small framework for memoization I am working on (http://projects.giacomodrago.com/c++memo/). At the moment, it is still in a beta stage and does not comply with Boost guidelines. I would like to know if you might be interested. The idea is to have a tool for rapid prototyping of software components implementing dynamic programming algorithms or, anyhow, requiring memoization. I paste here a simple "fibonacci" example: ---------------------- int fibonacci(int i, std::function<int(int)> prereqs) { if (i == 0) return 0; if (i <= 2) return 1; return prereqs(i-1) + prereqs(i-2); } cppmemo::CppMemo<int, int> cppMemo; std::cout << cppMemo.getValue(30, fibonacci) << std::endl; ---------------------- The code will print 832040, without recursive calls to the "fibonacci" function. A stack is managed "under the hood" and appropriate calls are made to the "fibonacci" callback to gather pre-requirements and then call the function again when they are available. This is just a toy example. There is more to know about, including important caveats (if you are interested, please read the documentation carefully): some defaults for template and method parameters are useful for conciseness but are EVIL if you don't know about them. The framework also provides automatic parallelization (when possible). It is important to remark that at this moment the framework relies on fcmm (fast concurrent memoization map), a custom hashmap that is optimized for this very purpose. This can be changed quite easily, as fcmm is not exposed to the client. A C++11 compliant compiler is needed. Thanks for your attention. Giacomo

Show replies by date

Piotr Wygocki

19 Nov 19 Nov

1:33 p.m.

Hi Giacomo, thank you for your contribution. I'm interested in this kind of library, particularly I often implement dynamic programs. I've got some remarks/questions. * In your implementation you use std::function adaptor. You can remove it by making some of the member functions templates( e.g. getValue). * I'm interested in the case when Key type is integral. In this case one can make the following optimization. Let density be the "the number of elements in map" / "the largest element in map" ratio. If the density is large, it is much better to use vector instead of hash map. The idea is to measure the density and when it exceeds certain threshold switch from hashmap to vector. Did you consider this optimization? (I'm aware this is not as easy as I described). This makes sense when you consider the unbounded knapsack problem. In UKP it is not clear (you have to decide in runtime) if you should use vector/ or hashmap strategy . IMO the "integral Key" case covers many standard dynamic programs. Regards, Piotr On 18 November 2013 09:30, Giacomo Drago <giacomo@giacomodrago.com> wrote:

...

Hi,

I would like to submit a small framework for memoization I am working on ( http://projects.giacomodrago.com/c++memo/). At the moment, it is still in a beta stage and does not comply with Boost guidelines. I would like to know if you might be interested.

The idea is to have a tool for rapid prototyping of software components implementing dynamic programming algorithms or, anyhow, requiring memoization.

I paste here a simple "fibonacci" example:

----------------------

int fibonacci(int i, std::function<int(int)> prereqs) { if (i == 0) return 0; if (i <= 2) return 1; return prereqs(i-1) + prereqs(i-2); } cppmemo::CppMemo<int, int> cppMemo; std::cout << cppMemo.getValue(30, fibonacci) << std::endl;

----------------------

The code will print 832040, without recursive calls to the "fibonacci" function. A stack is managed "under the hood" and appropriate calls are made to the "fibonacci" callback to gather pre-requirements and then call the function again when they are available.

This is just a toy example. There is more to know about, including important caveats (if you are interested, please read the documentation carefully): some defaults for template and method parameters are useful for conciseness but are EVIL if you don't know about them. The framework also provides automatic parallelization (when possible).

It is important to remark that at this moment the framework relies on fcmm (fast concurrent memoization map), a custom hashmap that is optimized for this very purpose. This can be changed quite easily, as fcmm is not exposed to the client. A C++11 compliant compiler is needed.

Thanks for your attention.

Giacomo

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/ mailman/listinfo.cgi/boost

Giacomo Drago

22 Nov 22 Nov

1:35 p.m.

...

Hi Giacomo, thank you for your contribution.

I'm interested in this kind of library, particularly I often implement dynamic programs. I've got some remarks/questions.

* In your implementation you use std::function adaptor. You can remove it by making some of the member functions templates( e.g. getValue).

I see. Out of curiosity, what is the reason to do so (maybe performance)? I chose std::function over template parameters as I find it quite convenient, e.g. I can pass nullptr to the function and then check this condition inside the function body. Turning getValue into a template function will imply major refactoring: Surely I will do it if it's required, and if you have some spare time, I'd appreciate insights based on the actual code.

...

* I'm interested in the case when Key type is integral. In this case one can make the following optimization. Let density be the "the number of elements in map" / "the largest element in map" ratio. If the density is large, it is much better to use vector instead of hash map. The idea is to measure the density and when it exceeds certain threshold switch from hashmap to vector. Did you consider this optimization? (I'm aware this is not as easy as I described). This makes sense when you consider the unbounded knapsack problem. In UKP it is not clear (you have to decide in runtime) if you should use vector/ or hashmap strategy . IMO the "integral Key" case covers many standard dynamic programs.

Interesting. How would you do it, with a template specialization for int/long/etc... and maybe std::pair<int, int> keys? This would be relatively easy, what I find tricky is to dynamically switch from the hashmap to the vector while preserving thread safety. Do you have any suggestions/examples? I appreciate your feedback and I'll work on your suggestion, but do you think this library may eventually find its place into Boost? Thank you for your time. Giacomo

Piotr Wygocki

23 Nov 23 Nov

10:25 a.m.

On 22 November 2013 14:35, Giacomo Drago <giacomo@giacomodrago.com> wrote:

...

Hi Giacomo, thank you for your contribution.

...
I'm interested in this kind of library, particularly I often implement dynamic programs. I've got some remarks/questions.

* In your implementation you use std::function adaptor. You can remove it by making some of the member functions templates( e.g. getValue).

I see. Out of curiosity, what is the reason to do so (maybe performance)? I chose std::function over template parameters as I find it quite convenient, e.g. I can pass nullptr to the function and then check this condition inside the function body.

The std::function is not for free. You gain type erasure but you lose some performance, you can read about it for example here: http://stackoverflow .com/questions/5057382/what-is-the-performance-overhead-of-stdfunction

...

Turning getValue into a template function will imply major refactoring: Surely I will do it if it's required, and if you have some spare time, I'd appreciate insights based on the actual code.

I think in this case it doesn't need major refactoring: each time you use the underlined type you add template parameter to the function. Maybe I missed something.

...

* I'm interested in the case when Key type is integral. In this case one

...
can make the following optimization. Let density be the "the number of elements in map" / "the largest element in map" ratio. If the density is large, it is much better to use vector instead of hash map. The idea is to measure the density and when it exceeds certain threshold switch from hashmap to vector. Did you consider this optimization? (I'm aware this is not as easy as I described). This makes sense when you consider the unbounded knapsack problem. In UKP it is not clear (you have to decide in runtime) if you should use vector/ or hashmap strategy . IMO the "integral Key" case covers many standard dynamic programs.

Interesting. How would you do it, with a template specialization for int/long/etc... and maybe std::pair<int, int> keys? This would be relatively easy, what I find tricky is to dynamically switch from the hashmap to the vector while preserving thread safety. Do you have any suggestions/examples?

My first thought was specialising the class for the case when std::is_integral<Key>::value is true. Unfortunately, I do not have any smart thoughts concerning the thread safety.

...

I appreciate your feedback and I'll work on your suggestion, but do you think this library may eventually find its place into Boost?

I can only say that I'd like to see library like that in boost but I'm afraid this is not my decision. Regards, Piotr

...

Thank you for your time.

Giacomo

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/ mailman/listinfo.cgi/boost

Giacomo Drago

14 Dec 14 Dec

7:41 a.m.

I successfully applied your suggestion of avoiding std::function. I have also cleaned up and refactored the whole source code. I gained a remarkable ~30% performance boost. As for the specialization for integral keys, I put it in the "backlog" but made no progress so far.

...

I can only say that I'd like to see library like that in boost but I'm afraid this is not my decision. Thank you, Piotr. It appears there is no such interest, at the moment. However, this conversation gave me useful insights. Have a look at the last version of the library: I encourage you to contribute (with benchmarks, further ideas or source code) if you like.

Regards Giacomo

Erik Erlandson

6:19 p.m.

----- Original Message -----

...

...
I can only say that I'd like to see library like that in boost but I'm afraid this is not my decision. Thank you, Piotr. It appears there is no such interest, at the moment.

Specific to here and now, I assume that the migration to github is occupying a lot of spare cycles in the community. In my opinion, memoization seems like a useful capability. I have no particular application for it on my to-do list, but having used memoization in prolog before, it seems like the sort of thing I might someday find myself wishing were in boost. IMO, proposals like this fall into a certain category, that might get less traction because if you randomly selected 1000 programmers, only a handful (at most) would have an application needing the proposed library. I think my current proposal for edit_distance/edit_alignment is also in this category. So is the handful of 'tree' proposals. Most applications simply don't need it Finding the people with the right combination of knowledge/interest in Boost, plus the domain knowledge to give good feedback, is tricky. And yet... when you want it, you want it :) I'm not sure what the answer to that quandary is. It's not like boost has none of these. Quaternion/Octonion come to mind as an example. I've seen other projects (e.g. octopress) that maintain a page of "notable 3rd-party contributions", that aren't in the official project, but can be used as a resource for people interested in capability-X that isn't in the official distribution. Alternatively, the new github library organization will allow for people to maintain customized library forks, so that is a possible avenue. For example, once the dust clears, I could fork boost::algorithm and maintain my own version that includes edit_distance. Periodically merge in the official version, etc.

Giacomo Drago

7:33 p.m.

...

In my opinion, memoization seems like a useful capability.

I am glad to know! I would like to remark that this library doesn't just provide memoization, it can also turn some recursive DP algorithms into non recursive (stack based) ones. Moreover, it provides thread safety and automatic parallelization out of the box...

...

IMO, proposals like this fall into a certain category, that might get less traction because if you randomly selected 1000 programmers, only a handful (at most) would have an application needing the proposed library.

I totally see your point: I didn't expect much feedback and in no way I think that it is due to me. ;) Best Giacomo

4221

Age (days ago)

4247

Last active (days ago)

List overview

Download

6 comments

3 participants

participants (3)

Erik Erlandson
Giacomo Drago
Piotr Wygocki