Saturday, December 14, 2013

Azure for hosting HTML files

I'll be teaching a friend some HTML, and was looking for an easy, free host for playing sites; Azure has free websites, but I was afraid I'd need to teach them webmatrix of VS express (nothing against them, but it is one more thing to learn). I just discovered Azure can deploy sites from a git repo !! yipee !

Sunday, November 24, 2013

XML queries in MS SQL Server

Recently I needed to extract some data that was stored as XML in a SQL Server 2012 db; I found the documentation lacking, mainly because you need to know both MS proprietary SQL extensions and XPath, to query the xml. Here's what I figured; it may not be optimal, but it works :) (BTW, I think Server 2008 has the same XML methods, but I only tested with 2012).

The xml type

SQL server supports an XML datatype; we use singlle-quotes to denote its values, but it is not quite a varchar type. Below we define a table Person, with id, name and an XML field called SkillsXML; we insert two rows. The xml looks like <skills>  <skill level="High"> SQL</skill> ... </skills>.

Simple queries

With this, we can use the query method of the xml datatype to select only certain nodes; for example, the following query would select only the first skill for each person. Notice we're using an XPath expression for selecting here; simple XPath expressions look like folders; here we're selecting from the root of the document, the skills node, and from there the first skill node (notice XPath starts counting at 1, not 0). Notice the value returned is of type xml, not a varchar, which is fine to just look at it, but you may want to cast it as a varchar to use it in other places.

Getting the value of an attribute

We can also use the value method, which allows us to get the value of an XML attribute; this method takes 2 arguments, an XPath expression and a string with a SQL datatype; the value of the XPath expression will be cast (or converted) to that type. The following query gets the value of the level attribute of the first skill.

Nodes queries

The most interesting queries use the nodes xml operator; with this, and cross apply (or outer apply) we can get one row per xml element; for example, the following query gets us each person with their skills; since each of our person rows has 2 skills, we will get 4 rows total. We can make this much nicer by using a WITH expression, selecting both the skill name (which is text inside the element and its level attribute, and casting each of them as a varchar; the following example selects only those skills with Medium level.

To Do

I will post examples using XML namespaces, and will add links to other references soon.

Saturday, November 23, 2013

Web api and Dapper for super-simple REST APIs

Recently c# got the dynamic keyword, which allows for truly dynamic typing (like in Ruby, Python or Javascript), and allow you to define objects to which you can add fields at runtime. In fact, we now have ExpandoObject in the standard library.

An obvious place to take advantage of that is with communicating to databases; Most times, we're still using a relational database, and use SQL to communicate with it. SQL is dynamically typed (at least in the sense that your types are coming from a database, independently of your program), and translating it to our objects usually involves lots of gruntwork; it would be nice if the language helps with that gruntwork; with dynamic and ExpandoObject being available, there are now a few micro-frameworks for DB access; one of the simplest is Dapper ; it just adds a few extension methods to the SqlConnection class, and can either automatically put data into our classes, or into dynamic objects (which is what we'll do).

The new-ish web api in automatically generates REST APIs with very little code; with that, and Dapper, we can give access to our database in very little time; for example, if we have a database with one table, called People, with two fields, id, and name; we can generate a controller with code as simple as:

Of course, eventually you may want to use a more complete ORM, like EntityFramework, but for quick prototyping, it doesn't get any simpler than this.

Saturday, November 2, 2013

Vlookup - doing joins in excel

I've been doing some data analysis in excel (and watching somebody who really knows excel :) one of the useful functions I've learned is vlookup, which basically lets you do a join (like in SQL) ; you pass it the value to find, a range of cells over which to look, an index indicating which column to return and a boolean indicating whether to do exact or approximate matching (usually you pass false, to ask for exact matching, a behavior like SQL)

invoke-webrequest : wget for Windows

Many times you need to do a quick test that a web site or program works, or you want to download a web page within a script; in unix/linux, you'd probably use curl or wget ; Powershell (as of 3.0) has invoke-webrequest , which does a similar function.

Although it has many other options, the most basic invocation just takes a URL and it returns the 'output' of that request, as an object containing several fields; the Content field contains the actual information returned (the web page), so you can use something like this:


to find out its length, and could use String.Contains and such to verify that it returns the info you expected (so you could script a simpletest for your web app).

Other useful parameters include -OutFile to save the page to a file, and -UseDefaultCredentials to have the request use your windows credentials, in an intranet.

Saturday, October 26, 2013

Powershell: Array2Object function

I've been using Powershell a lot recently (I'll be blogging a lot more about it soon); sometimes, you get an array and it is convenient to transform it into an object (so we can apply select and similar operators to it); objects in powershell can be created with the New-Object cmdlet, and we pass it the kind of object; the basic powershell object is of type psobject, and we can use Add-Member to add a field to it (powershell objects, like in most scripting languages, can have fields added dynamically to them)

The following function will create an object, given an array ; the object have fields f1,f2 ... corresponding to the array elements 0,1,...

function array2Object($arr)
    $obj=New-Object psobject
    foreach($field in $arr) {
        Add-Member -InputObject $obj -MemberType NoteProperty -Name "f$fn" -value $field
    return $obj

Sunday, September 8, 2013

Categorizing your unit tests in C# (Visual Studio)

C# comes with a unit testing framework (at least in Visual Studio :). You write a test case like this:

public void TestMethod1() ...

When you have more than a few test cases, it is useful to break them into groups or categories; you can use:

  • Owner("name")
  • TestCategory("category")
  • TestProperty("name","value")
For example:

[TestMethod, Owner("Orlando"),TestCategory("stuff"),TestProperty("group","1")]
public void TestMethod1() ...

On the test explorer window, VS will show you your tests organized based on those values, and you can then select and run only those tests you want.

Wednesday, July 17, 2013

Improving your command line in windows

I've just read Making a better, somewhat prettier, but definitely more functional Windows Command Line, and tried a couple of his suggestions, since I'll be using Windows a lot more.

I already knew (and had installed) Git-For-Windows (I called it git-bash :), which is incredibly cool, since it installs bash and a bunch of Unix commands (including ssh, and which :) and have been using MobaXTerm (which also includes an XWindows server, and so it's great for communicating with Linux servers  (and I know I will be learning powershell :)

But I got to try Clink which is simple but amazingly useful; it embeds Gnu's readline into cmd.exe, so your command terminal uses tabs for autocompletion (and shows the choices, like bash does), and uses 'normal' (well the ones my fingers know :) keybindings, like Ctrl-A to go to the beginning, Ctrl-E to go to the end, and Ctrl-R to search your history.

I also installed Gow, which installs a bunch of Gnu utilities; I didn't find it that useful, since most of them come with git-for-windows, but includes wget (which I always use :) and adds a menu to the file manager that allows you to open a command prompt in a given folder, which is useful.

Tuesday, May 28, 2013

Retrying things in ruby

Working with remote systems that are not always under your control often involves failures; I'm working on a system that does an API call on a remote system, and the remote system needs to call other systems to finish the call (it is calling a licensing server, from another company). The call already has timeouts and delays, but it crashes often enough, which requires me to restart the job (the job is inside a resque queue).

Now, if we try to do something, and it fails, it will usually raise an exception; Ruby has exception handling, including the rescue keyword, which will restart the block that started the exception; Ruby also has blocks, procs and lambdas, which allow us to pass a piece of code (OK, a closure :) to another function or method, so we can define a function as follows:

And then we could use it like:
Voila ! we now can easily retry our operation several times; if we wanted to be fancy, we could add a time to sleep between tries, and even define a small DSL for this, but for my needs this was enough.

Tuesday, April 23, 2013

My rails cheatsheet

I've been using rails more often now; however, I still google lots of basic stuff :) this is my personal cheat sheet, but might be also useful to others. I will be adding more as time goes on.

ActiveRecord Datatypes

  • :binary, :primary_key
  • :boolean
  • :date, :datetime, :time:timestamp
  • :decimal, :float, :integer
  • :string, :text
  • Relationships: :belongs_to, :has_one, :has_many (still need to add FK fields)

Tuesday, January 1, 2013

Memoization in C++

Many problems lend themselves well to recursive solutions; we express a big problem in terms of a similar problem that is 'smaller' in size, for which we call our same function to solve it, until we get to a problem that is small enough that we know how to solve it (we tend to call the known one the base cases).

 The classical example of a recursive function is the factorial function; we know that the factorial of 0 is 1, and we can define the factorial of an integer, say n, greater than one, as the number, n times the factorial of n-1, with the obvious c++ implementation as follows:
Another of the classical problems is generating a number in the Fibonacci sequence; this sequence is like: 0,1,1,2,3,5,8,13 ... where we start with 0 and 1, and then each number is the sum of the two previous ones; we want to write a function, let's call it fibo, that, given the place or index in the sequence would return the number, so fibo(0) would yield 0, fibo(2) would yield 1, fibo(7) 13 and so on. We could define this function as follows:
Many times, we end up solving the same sub-problem more than once, which can heavily decrease performance; for example, in the function above, we would end up solving fibo(n-2) 2 times, fibo(n-3) 4 times and so on, exponentially; one easy way we can solve this is to keep track of the values we've already calculated, and just calculate a new value if it's not in this cache; an easy way to store the values is in a map (in this case, since we're talking about unsigned ints, we could use a vector, but a map works in the general case). The code would look like:
This code is, in practice, much much faster than the other one (goes from exponential, to n*log(n) ). This general technique of storing already-calculated values for a function in a cache is called memoization.
We can generalize this pattern, and create a generic memoizer function, called memoize. Since we don't know the type of the input or output values, we need a templatized function, with two type parameters, as follows:
To use it, we would need to define our function with a lambda:
And in other place, replace the variable with its memoized variation:
Of course, this would only work for functions that take only one argument, but it would not be hard to do it for other kinds of functions, although the pattern is more important than the particular functions :).
Many would realize that there's a better iterative algorithm for Fibonacci numbers; problems where memoizing helps can be solved with dynamic programming iterative algorithms, but, in general need a better understanding of the algorithm; when faced with a new problem, we can usually start with a recursive solution, memoize if we have performance problems and we suspect or know we're solving the same subproblems repeatedly, and move to dynamic programming only if we cannot achieve good performance with memoization.