
On automatic differentiation: Russel: I am glad you brought this up. I think that this is about the coolest application of generic/functional programming I have ever seen, and was a driver for what I have been talking about here with math_function. To give a brief summary of AD: * All functions that are implemented on a computer (+, *, sin(), etc.) are actually composed of fairly simple atomic operations. * If you use a (compile-time) stream of recursive applications of the chain-rule and product-rule, you can get the gradient of a function that is as close to analytically correct as the implementation of the underlying functions themselves. * If anyone reading the "[Proto] implementing an computer algebra system with proto" thread is reading this, I believe it is a very similar type of problem. * No more use of expensive and inaccurate finite-difference since you will have a single direct function. * Because we are usually dealing with multi-dimensional functions, you typically need to think through how all the product/chain rule interacts with an array as well. * Many implementations of AD use a text pre-processor for Fortran, etc. So you can immediately guess that one way to approach this is with expression templates. The paper you mentioned is a great reference. Another thing for those interested to look at is the Sacado subproject of Trilinos(at Sandia). See https://cfwebprod.sandia.gov/cfdocs/CCIM/docs/Sacado_06.ppt Another implementation I was looking at is: http://www.fadbad.com/download/FlexibleAD-talk.pdf A few notes on these implementations: * In order to use expression templates and overloading, these have ended up having to subvert and replace the type systems or replace standard functions with their own templates. This may be necessary in some form, but right now you end up having to write all of your code with non-standard data structures, types, and functions. And you often need to set up a complicated framework to get AD support. * You need expression templates for all of the functions. But boost has a lot of functions would want to use in http://www.boost.org/doc/libs/1_35_0/libs/math/doc/html/index.html * But for many problems, one needs to be able to have their own functions AD aware as well. For example, my research usually involves solving a functional equation (dynamic programming). You pass in another function as part of its parameters (For an example, see see http://jacek.rothert.googlepages.com/vfi.pdf) * I am already far beyond my level of competence, but it appears to me that a lot of the problems here for making things look syntactically easy are similar to what lambda did. Perhaps it is possible to do all of these without types and functions. Using boost::proto may make it much more standard for people to convert their own functions to being AD-ready. * You can probably see here why I was thinking about how a general math_function concept is necessary to build on if we want to start adding in things like this. Of course, AD guys would really need to get involved with trying to add their stuff to boost to get everything working well. Stephen: Thanks as always, a few follow-ups:
Question 3: How would we adapt a structure implementing the operator() and gradient() to be stored here? Just like boost::function accepting any function object? struct square { double operator()(double x) {return x * x;} double gradient(double x){return 2 * x;} }; math_function
myfunc(square); template<class F> math_function(const F& f) : f_(f), g_(bind(F::gradient, f, _1...?) //Here I don't see how to bind to the arbitrary signature, extracting the inputs from the Signature which (may) be more than 1
Easily. Great to hear. I think that getting the constructor working might be a good first step. Any hints on how to extract the list of inputs from the signature regardless of arity? I think this seems to be a common problem I have in dealing with functors and boost::bind. What might the "...?" look like in the example I posted?
Question 3: What are the thoughts on whether/how this is extensible to higher dimensions? Lets start on the gradient with mapping functions: f : R^N -> R Sure. But wouldn't you want DomainDim to be the arity of the function (at least by default). Might be nice to extend it that way, but things may be a lot easier if you just assume that the dimension is only in the first parameter and it is passed in as some kind of array concept. Why? Because most multi-dimensional stuff in optimizers, solvers is already done with arrays for the parameters. And often the other arguments in the function are constants/parameters that you don't want to differentiate over. Last: What is the type of the gradient? If we allow different types then it ends up having to be a tuple. What about the second derivative? A cross-product of tuples? Probably possible, but not worth it. Last, the math_function concept should really have an in/ out concept. For example, .gradient(const array
& x, array & grad_out){...}. This is for compatability with a lot of other libraries that use this approach, especially when you are worried about temporaries.
However, it is fairly easy to give math_function members that return these traits at runtime. Is there a canonical example of a way to do this the best way?
Thanks, Jesse