Sunday, October 14, 2012

Getting started with Amazon's DynamoDB in Python

Amazon provides many cloud services, both for Infrastructure as a Service (running Virtual Machines in EC2) and Platform as a Service (providing the actual services). One of the services is DynamoDB, a simple NoSQL database, hosted on Amazon's servers.DynamoDB, like many NoSQL Databases is a simple key-value storage, with some added capabilities.
Basically, you store objects (representable in JSON), matched to a key. Your key can be a string or an int (and actually, DynamoDB allows your key to be composed of two fields, a hash and a range, but here we will only do the hash). Your object is just a dictionary mapping fields to values.
The best python library for DynamoDB (and for most other aws services) is boto.  The easiest way to install boto is with pip; just do pip install boto or sudo pip install boto.
Amazon offers a free tier for DynamoDB, which should be enough for playing and small applications (the DB has to be less than 100M in size, and you get 5KB in reads and 1KB in writes per second), so head down to aws.amazon.com and create your account, if you don't have it; after making your account, go to security credentials and find out your access key id and your secret access key. You may want to store your credentials in a file called .boto in your home folder, as explained here. After that, you can connect to your dynamodb as follows:
Now, let's try to create one table. Our table will be called users, and its key will be a string. We first need to create a schema (the schema only specifies the name and type of your key; your 'table' can contain any kind of objects, with any attributes), and then use that schema to create a table, as follows:
Once we have created the table, we go onto adding items; we create a dictionary containing our data, then use the new_item method of our table to create the item, and finally the put method of that item to actually add it to the database, as follows: And, after creating several items we can obtain an element (knowing its key) with the get_item method of a table, as follows (notice it will throw an exception if the key doesn't exist): To modify the item in the DB, we would just modify it in memory (it looks like a dictionary) and then call its put method again.
Most of the times you will be searching for a given key (or at least will know a part of the key), but if not, you can use the scan method (notice this method goes through EVERY object on the table, and so may be slow and consume a lot of I/O). You can pass a list of conditions (as of now, the documentation is incorrect, stating you specify the conditions with strings; you need to use elements coming from boto.dynamodb.condition). A scan would look like: For more information, check: Or come back :) I should be blogging more about dynamodb (and other aws services) soon.

Friday, October 12, 2012

Cons lists in C++

Lisp and other functional languages have purely-functional lists; that is, lists that cannot be changed; this leads to a very different style of programming, and can save memory and time, since parts of the list can be shared among several lists; with the new std::shared_ptr in C++, implementing such a list is relatively simple.

Although lisp lists allow mixing of types (and so are equivalent to binary trees), we will implement typed lists, which allow elements of only one type, using templates. We want to provide 4 functions:
  • cons, which takes an element and a list, and produces another list, containing that element as its first element (and the passed list as the elements after the first element; notice we're going to share this passed list).
  • car, which takes a list and returns its first element (sometimes called head).
  • cdr, which takes a list and returns another list, the list minus the first element (sometimes called tail).
  • isEmpty, which returns true if the list is empty, false otherwise.
Since cons is the operation that makes the list, I am calling my node ConsNode and my list class ConsList (btw, boost has a somewhat similar structure, using template metaprogramming, which means it can only be defined at compile time; see bost::const). Our Node class can be defined as:
And we can define our list class, simply as a synonym for shared_ptr (notice that both shared_ptr and template typedefs are c++11 features; template typedefs are implemented in g++4.7 and newer). The shared_ptr is a smart pointer, that basically implements a reference-counter pointer; several shared_ptrs can point to the same dynamically allocated chunk of memory; every time one is destroyed (because it goes out of scope), the counter is decremented; when the counter reaches 0, the memory is deallocated by calling delete. This allows us to not worry about calling delete manually.
Now, cons is the function that will create a new list, given an old list and a new element (notice this does NOT change the original list at all; we want to avoid modifying the lists, so they can be shared as parts of other lists). Our cons function is going to call std::make_shared, which creates a new reference-counted chunk of memory (you can create a shared_ptr from a plain dynamically allocated pointer, but at the cost of an extra call to new). Our cons function looks like: And car, cdr and isEmpty are simply defined as: For convenience, we can define an operator<< ; notice we're using car, cdr and isEmpty, and using recursion. With car, cdr and isEmpty, we can define functions recursively, like we usually would do in lisp or scheme (well, with slightly ugly template syntax :). Below are examples for len (which returns the length of a list) and sum (which adds up all the elements of a list of numbers). And we could use lists in a main function, like below (notice how the lists share a lot of the storage, which saves both time and space).

Thursday, October 4, 2012

Functional sets in C++

I am taking a functional programming class. One of the exercises involves defining sets in terms of functions; since C++ now supports lambdas (and closures), I figured it would be interesting to implement them in C++ (the assignment was in a different language. It is similar to assignments in SICP)

The idea is to define a set (of integers, to simplify), as a function; you pass the element and it returns whether the element is in the set or not. So, we can define a Set as a function that takes a boolean and returns an int. I also define a predicate, which is identical to a set. BTW, I'm using the new std::function  type.
Now, we can define a singletonSet function, that takes an int, and returns a set that contains that int. We need to create a function on the fly, at runtime, and that's what lambdas let us do. We define a lambda with the [] syntax, and the = inside means to capture any variables (make a closure) by value. In this case, we're capturing the elem variable, so every call to singletonSet returns a new lambda, with the right value of elem!
And with that, we can define the union of two sets, as a new function (lambda) that returns true if either set contains the elements, and the intersection as a function that returns true if both sets contain the element, as in the following code:
You could use singletonSet and set_union, as follows:
And set_intersection as follows:
The assignment had a couple other functions (foreach and exists), and you can find the full source, some unit tests and a demo at my github