Tuesday, January 1, 2013

Memoization in C++

Many problems lend themselves well to recursive solutions; we express a big problem in terms of a similar problem that is 'smaller' in size, for which we call our same function to solve it, until we get to a problem that is small enough that we know how to solve it (we tend to call the known one the base cases).

 The classical example of a recursive function is the factorial function; we know that the factorial of 0 is 1, and we can define the factorial of an integer, say n, greater than one, as the number, n times the factorial of n-1, with the obvious c++ implementation as follows:
Another of the classical problems is generating a number in the Fibonacci sequence; this sequence is like: 0,1,1,2,3,5,8,13 ... where we start with 0 and 1, and then each number is the sum of the two previous ones; we want to write a function, let's call it fibo, that, given the place or index in the sequence would return the number, so fibo(0) would yield 0, fibo(2) would yield 1, fibo(7) 13 and so on. We could define this function as follows:
Many times, we end up solving the same sub-problem more than once, which can heavily decrease performance; for example, in the function above, we would end up solving fibo(n-2) 2 times, fibo(n-3) 4 times and so on, exponentially; one easy way we can solve this is to keep track of the values we've already calculated, and just calculate a new value if it's not in this cache; an easy way to store the values is in a map (in this case, since we're talking about unsigned ints, we could use a vector, but a map works in the general case). The code would look like:
This code is, in practice, much much faster than the other one (goes from exponential, to n*log(n) ). This general technique of storing already-calculated values for a function in a cache is called memoization.
We can generalize this pattern, and create a generic memoizer function, called memoize. Since we don't know the type of the input or output values, we need a templatized function, with two type parameters, as follows:
To use it, we would need to define our function with a lambda:
And in other place, replace the variable with its memoized variation:
Of course, this would only work for functions that take only one argument, but it would not be hard to do it for other kinds of functions, although the pattern is more important than the particular functions :).
Many would realize that there's a better iterative algorithm for Fibonacci numbers; problems where memoizing helps can be solved with dynamic programming iterative algorithms, but, in general need a better understanding of the algorithm; when faced with a new problem, we can usually start with a recursive solution, memoize if we have performance problems and we suspect or know we're solving the same subproblems repeatedly, and move to dynamic programming only if we cannot achieve good performance with memoization.

2 comments:

  1. Nice pattern! Of course you may want to add a "time to live" to the values in your map so that your cache doesn't grow forever and eat all the memory. In that case, using a boost::multi_index where you pick a hash table and a list and as items are found, move them to the front of the list. (delete and add), OR a hash table and a heap where you modify the time, and reheap-fy.

    In anycase fun to think about.

    ReplyDelete
  2. The other common interview quiz question "find the nth prime" also benefits from this optimization.

    ReplyDelete