Sunday, April 27, 2014

Automatas (finite state machines): A programmer's perspective

Finite state machines are simple and nifty; when I was teaching theory, I saw students, even good programmers, had trouble with them when presented as math, so I figure showing them as programs could make it click for some people.

From http://cg.scs.carleton.ca/~michiel/TheoryOfComputation/ (a free ToC book)

Definition 2.2.1 A finite automaton is a 5-tuple M = (Q, Σ, δ, q, F), where
1. Q is a finite set, whose elements are called states,
2. Σ is a finite set, called the alphabet; the elements of Σ are called symbols,
3. δ : Q × Σ → Q is a function, called the transition function,
4. q is an element of Q; it is called the start state,
5. F is a subset of Q; the elements of F are called accept states.

Doesn't that sound fancy ? It means we have an object with 5 fields (a 5-tuple), Q, a set of things called states, Σ, a set of symbols called the alphabet (we'll use characters as our symbols), δ, a function that takes a state and a symbol and returns a state (conceptually we are 'moving' from one state to another when we see a symbol), q, an element of Q , called the start state and F a subset of Q, the set of accepting states.

In Scala, we can represent it as:
class DFA[StateType]
(
val Q:Set[StateType],
val F:Set[StateType],
val q0:StateType,
val delta: (StateType,Char)=>StateType,
val Sigma:Set[Char]
)
{
...
view raw DFA1.scala hosted with ❤ by GitHub


We can then define the extended transition function, which says which state we end up at by following a string, recursively as
This means if the string is empty, we stay at the state we're at; if not, we follow the first character (by using delta), and then follow the rest of the string from there; this recursive definition maps nicely to a List in scala (or any other language with cons-style lists :), so we end up with the following code:
def deltaHat(state:StateType,input:List[Char]):StateType= {
input match {
case Nil => state;
case head::tail => deltaHat(delta(state,head),tail)
}
}


And the automata accepts a string if we would go from the initial state to the final state by following that string; in Scala
def accepts(input:List[Char]):Boolean =F contains deltaHat(q0,input)
To use it we would first define a delta function, say:
// sample DFA transition function
def dfa1_delta(state:Int,inp:Char)={
state match {
case 0=>
if(inp=='a') 0 else 1;
case 1=>
if(inp=='a') 2 else 1;
case 2=> 2
}
}
view raw dfa_delta.scala hosted with ❤ by GitHub
And then we can use like this:
val Q=Set(0,1,2);
val F=Set(2);
val d=new DFA(Q,F,0,dfa1_delta,Set('a','b'));
println(d.accepts(List('b','a')));
view raw dfa_use.scala hosted with ❤ by GitHub
The full code is at https://github.com/okaram/scala/tree/master/automata, including non-deterministic automata (which I will probably get around to blog about some other time :)

Thursday, April 17, 2014

Accessing Databases with R (RStudio)

R has libraries for accessing several databases, but if you're on Windows, you'd probably prefer the RODBC package; with this, you can access almost any db through ODBC. You can create a DSN and use odbcConnect, or use odbcDriverconnect ; the problem is figuring out the connection string, especially since you need to specify the driver. For SQL Server, this works (assuming db is in localhost):


library(RODBC)
c=odbcDriverConnect('Driver=SQL Server;Server=localhost;Database=......;Trusted_Connection=True')

And then you can do odbcquery like:
res=sqlQuery(c,"SELECT * FROM ....")

And you get a data frame !

http://www.connectionstrings.com/ has odbc connection strings for many dbs

Wednesday, April 2, 2014

Super quick Linq to XML

I was trying to figure out how to get started with Linq to XML, and found this article. I figured I'd simplify it even more for myself :)
Imagine we have the following file, saved in c:\contacts.xml
<contacts>
<contact id="1">
<firstName>Orlando</firstName>
<lastName>Karam</lastName>
</contact>
<contact id="2">
<firstName>Lina</firstName>
<lastName>Colli</lastName>
</contact>
</contacts>
view raw contacts.xml hosted with ❤ by GitHub

Then we can access it as follows
XDocument loaded = XDocument.Load(@"C:\contacts.xml");
var contacts = from c in loaded.Descendants("contact")
select new {
id=(int)c.Attribute("id"),
name=(string)c.Element("firstName") +" " +(string)c.Element("lastName"),
};
foreach (var c in contacts)
Console.WriteLine("Contact: id={0}, name = {1}", c.id, c.name);
view raw sample.cs hosted with ❤ by GitHub
Notice:

  • on an XDocument, we can call descendants and pass it an element name, to get all the elements with that name (we can also call the no-args version, and get all descendants)
  • What we're getting is an XElement, on which we can call methods like Element or Attribute
  • We can directly cast an element into a string, and it gets us all the text inside that element
  • We can cast an attribute as a string, or an int (or even a float, double or many other types)