Tuesday, January 1, 2013

Memoization in C++

Many problems lend themselves well to recursive solutions; we express a big problem in terms of a similar problem that is 'smaller' in size, for which we call our same function to solve it, until we get to a problem that is small enough that we know how to solve it (we tend to call the known one the base cases).

 The classical example of a recursive function is the factorial function; we know that the factorial of 0 is 1, and we can define the factorial of an integer, say n, greater than one, as the number, n times the factorial of n-1, with the obvious c++ implementation as follows:
Another of the classical problems is generating a number in the Fibonacci sequence; this sequence is like: 0,1,1,2,3,5,8,13 ... where we start with 0 and 1, and then each number is the sum of the two previous ones; we want to write a function, let's call it fibo, that, given the place or index in the sequence would return the number, so fibo(0) would yield 0, fibo(2) would yield 1, fibo(7) 13 and so on. We could define this function as follows:
Many times, we end up solving the same sub-problem more than once, which can heavily decrease performance; for example, in the function above, we would end up solving fibo(n-2) 2 times, fibo(n-3) 4 times and so on, exponentially; one easy way we can solve this is to keep track of the values we've already calculated, and just calculate a new value if it's not in this cache; an easy way to store the values is in a map (in this case, since we're talking about unsigned ints, we could use a vector, but a map works in the general case). The code would look like:
This code is, in practice, much much faster than the other one (goes from exponential, to n*log(n) ). This general technique of storing already-calculated values for a function in a cache is called memoization.
We can generalize this pattern, and create a generic memoizer function, called memoize. Since we don't know the type of the input or output values, we need a templatized function, with two type parameters, as follows:
To use it, we would need to define our function with a lambda:
And in other place, replace the variable with its memoized variation:
Of course, this would only work for functions that take only one argument, but it would not be hard to do it for other kinds of functions, although the pattern is more important than the particular functions :).
Many would realize that there's a better iterative algorithm for Fibonacci numbers; problems where memoizing helps can be solved with dynamic programming iterative algorithms, but, in general need a better understanding of the algorithm; when faced with a new problem, we can usually start with a recursive solution, memoize if we have performance problems and we suspect or know we're solving the same subproblems repeatedly, and move to dynamic programming only if we cannot achieve good performance with memoization.

6 comments:

  1. Nice pattern! Of course you may want to add a "time to live" to the values in your map so that your cache doesn't grow forever and eat all the memory. In that case, using a boost::multi_index where you pick a hash table and a list and as items are found, move them to the front of the list. (delete and add), OR a hash table and a heap where you modify the time, and reheap-fy.

    In anycase fun to think about.

    ReplyDelete
  2. The other common interview quiz question "find the nth prime" also benefits from this optimization.

    ReplyDelete
  3. Memoization is an optimization technique used primarily to speed up computer programs by keeping the results of expensive function calls and returning the cached result when the same inputs occur again.

    Memoization is a programming technique which attempts to increase a function’s performance by caching its previously computed results. Because JavaScript objects behave like associative arrays, they are ideal candidates to act as caches. Each time a memoized function is called, its parameters are used to index the cache. If the data is present, then it can be returned, without executing the entire function. However, if the data is not cached, then the function is executed, and the result is added to the cache.

    Lets begin by taking example, the original function is rewritten to include memoization. In the example, a self-executing anonymous function returns an inner function, f(), which is used as the outer function. When f() is returned, its closure allows it to continue to access the “memo” object, which stores all of its previous results. Each time f() is executed, it first checks to see if a result exists for the current value of “n”. If it does, then the cached value is returned. Otherwise, the original Fibonacci code is executed. Note that “memo” is defined outside of f() so that it can retain its value over multiple function calls.


    var outer= (function() {
    var memo = {};

    function f(n) {
    var value;

    if (n in memo) {
    value = memo[n];
    } else {
    if (n === 0 || n === 1)
    value = n;
    else
    value = f(n - 1) + f(n - 2);

    memo[n] = value;
    }

    return value;
    }

    return f;
    })();

    ReplyDelete
  4. I think your code is flawed. The fact that map is not empty, doesn't mean that it contains the desired input parameter. You need a find there.

    ReplyDelete
    Replies
    1. Actually, my comment is flawed. I imagined size() instead of count(n) there.

      Delete
    2. So to not end up adding just the noise and no useful info, let me mention that static initialization of the map above is thread-safe only in C++11, also depending on the compiler support for C++11.

      Delete