Tuesday, February 28, 2012

On Lisp in Clojure ch 2 (2.4) Polymorphism

This is my second post translating the examples from On Lisp by Paul Graham into Clojure.

Section 2.4 is titled Functions as Properties. What Graham is describing in his example is polymorphism, and there are a couple of ways to do this in Clojure.

First, here is the Clojure version of the method using a conditional statement.

(defn behave [animal]
  (cond
   (= animal 'dog) (do '(wag-tail) '(bark))
   (= animal 'rat) (do '(scurry) '(squeek))
   (= animal 'cat) (do '(rub-legs) '(scratch-carpet))))

My version has a lot of quotes where the original does not, such as '(dog) and '(wag-tail) because I wanted my code to compile, without actually having to create a wag-tail function. Any new animals we create will require editing our behave function.

Rather than go into the details of Graham's alternative method in the book, I will just jump to two ways to do the same thing in Clojure.

Protocols

The first is with protocols. A protocol is a list of functions that can be applied to different data-types. Each function must take at least 1 parameter. It is this required parameter that determines what actually gets called at run time. Protocols are applied to data types that are defined with either deftype or defrecord. For the purposes of protocols, the two are interchangeable, as in my example:

;; define the protocol
(defprotocol animal
  (behave [this] ))

;; define a dog
(defrecord dog [breed]  )

;; add the animal protocol to dog type
(extend dog
  animal
  {:behave (fn [src] (do '(wag-tail) '(bark)))})

;; create a dog
(def my-dog (dog. "collie"))

;; see what it does
(behave my-dog)

;; define a rat
(deftype rat [color])

;; add the animal protocol to the rat type
(extend rat
  animal
  {:behave (fn [src] (do '(scurry) '(squeek)))})

;; create a rat
(def brown-rat (rat. "brown") )

;; see what it does
(behave brown-rat)

Protocols allow you to add polymorphic behaviors to any data type. To get an idea of how powerful they are try typing this in:

(extend String
  animal
  {:behave (fn [src] (do '(what)))})

(behave "huh")

Then consider that java.lang.String is declared as final.

Multimethods

While protocols are similar to overriding functions, multimethods are like overloading. Protocols broadened overriding to make polymorphism work independent of class hierarchies. Multimethods allow overloading based not just on parameter type, but even parameter values.

Here's the multimethod version of the book's example.

;; define the multimethod type
(defmulti behave-multi identity)

;; define implementations for our animals
(defmethod behave-multi 'dog [x]
  (do '(wag-tail) '(bark)))
(defmethod behave-multi 'rat [x]
  do '(scurry) '(squeek))

;; try them out
(behave-multi 'dog)
(behave-multi 'rat)

The following example is pretty dumb, but it will show you that you a way to choose which function to call based on variable values, instead of just variable types in traditional overloading.

(defmulti two-behaviors (fn [num]
                          (if (odd? num)
                            :odd
                            :even)))

(defmethod two-behaviors :odd [num]
  (str num " is odd"))

(defmethod two-behaviors :even [num]
  (str num " is even"))

(two-behaviors 3)
(two-behaviors 4)

This was just a glimpse into protocols and multimethods in Clojure. The Clojure books I have seen devote a chapter to each, and I suspect they still leave a lot unsaid.

Monday, February 27, 2012

On Lisp in Clojure ch 2 (2.1 - 2.3)

This is the first post translating the examples in On Lisp into Clojure. The book is available to read for free online, so please read the book, and check here if you are not sure how it applies to Clojure.

In the book, chapter 1 described the extensibility and flexibility of Lisp. In short, it lays out the reason I am going through the book. The few code samples in this chapter didn't seem to benefit from translation though.

Chapter 2 is about Functions. Section 2.1 discusses functions as a data type. It doesn't contain any examples.

Section 2.2 Defining Functions

;; double function
(defn double [x] (* x 2))

;; invoking double
(double 1)

;; referring to the function as a variable
#'double
(= #'double (first (list #'double)))

;; lambdas are defined two ways in Clojure
(fn [x] (* x 2))
#(* % 2)

;; invoking an anonymous function
((fn [x] (* x 2)) 3)
(#(* % 2) 3)

Next, the book assigns a variable double, and shows that the function and the variable can exist side by side with the same name. This works in common lisp, because it has a separate namespace for functions and variables it does not work in Clojure. CLISP is said to be a LISP-2 (Two namespaces) While Clojure (and scheme) are LISP-1s.

;; we can assign assign a function to a variable
(def x #'double)
(x 2)

Section 2.3 Functional Arguments

;; four expressions with the same effect
(+ 1 2)
(apply + '(1 2))
(apply + 1 '(2))
(apply + 1 2)

;; funcall doesn't exist in Clojure

;; map applies a function to each argument in a list
(map #(+ % 10) '(1 2 3))
(map + '(1 2 3) '(10 100 1000))

;; sort is ascending by default
(sort '(1 4 2 5 6 7 3))
;; to specify sort order, pass the function then the list
(sort > '(1 4 2 5 6 7 3))

;; instead of remove-if, in Clojure use filter, true values are returned
(filter odd? '( 1 2 3 4 5 6 7))
;; pred can also be an anonymous function
(filter #(= 0 (rem % 3)) '(1 2 3 4 5 6 7 8 9))

;; our-remove-if was implemented with recursion in the book.
(defn our-remove-if [pred lst]
  (if (empty? lst)
    nil
    (if (pred (first lst))
      (our-remove-if pred (rest lst))
      (cons (first lst) (our-remove-if pred (rest lst))))))

(our-remove-if even? '(1 2 3 4 5))

;; The actual filter function is implemented with a lazy sequence
;; You can see the implementation of core functions with (source ...)
(source filter)

I am going to stop here, because section 2.4 is a big enough topic that it should have its own space.

Reading On Lisp to learn Clojure

When I read Beating the Averages by Paul Graham, I knew I needed to look into Lisp.

One of the first things I learned was that "Learning Lisp" really meant learning a Lisp. Lisps are a family of languages with some common attributes. The most significant attribute is that Lisp code is written in a Lisp data structure. Since code manipulates data, and code is data, code can manipulate code. This manipulation is done with Lisp macros.

Because there are several languages that are Lisps, I had to pick one. I opted for Clojure, initially because it could compile down to .NET IL or Java byte code or Javascript. Soon I found out how well thought out its data structures and support for concurrency is.

I am confident that Clojure is the right Lisp for me, but that still begs the question, is Lisp right for me. Are Lisp macros really that powerful? Chas Emeric in Clojure Programming describes how Lisp macros are different from other languages' meta-programming support. They are code, not strings. They are applied at compile time, not at run time. They are using the same code that the programs use, not special api calls. But does all this matter?

It was Graham's essay that prompted the question, and I am hoping that one of Graham's books can answer it. On Lisp is available to read free online. According to the preface, the first 6 chapters deal with functions and the remaining 19 deal primarily with macros.

On Lisp should help me learn to think in macros. It is written in Common Lisp, which is different than Clojure. Hopefully translating it will help me to think in Lisp in Clojure.

I am not going to copy the whole book. I plan to post Clojure versions of the examples, with some explanation when it seems appropriate. If you want to play along, start reading Graham's book, and check back here for my Clojure interpretations of it.

Saturday, February 25, 2012

Different Platform/Different World

Between .NET and the JVM is a much wider gulf than I had imagined. Switching from C# to Clojure has meant 3 changes: imperative to functional, non-lisp to lisp and .net to jvm.

The switch from imperative to functional is a recurring theme on this blog. The reasons for switching to a Lisp will be the subject of another post (maybe posts). Switching platforms was incidental and that made it all the harder.

C# and Java are similar, but the programmers think differently. If you find yourself switching from one platform to the other, it may go more smoothly if you are aware of the differences.

.NET is punctuated equilibrium. Java is incremental evolution.

On the .NET platform, from time to time Microsoft releases a new version with new features and this becomes the new standard. Cases where Microsoft says that the new technology is an alternative, not a replacement, only mean that the transition will go a little slower.

To be a .NET developer is to be a Visual Studio developer. It is the IDE. It is also the build tool and dependency manager. Changes to the IDE are changes to the platform and changes to the platform have to be supported by the IDE to be adopted.

In the Java world, new practices come more from ideas, libraries and tools coming from the community than from the company controlling the language (first Sun, now Oracle). When a library becomes standard, it is because the community has found it useful. There are several different IDEs, and there are developers who don't use IDEs at all.

Living in the .NET world is long periods of comfort interrupted by periodic shocks. Visual Studio is a great tool, there is a standard way to do things and there is great support for the standard. Each .NET release is an epochal change from the prior version. The new versions tend to be great, but they are sudden, and often huge, changes.

Based on what I can see on the outside, it looks like it is the responsibility of each organization that uses the JVM to determine the right way to do things. It means working hard to keep up with many different projects each taking its own approach to solving an individual program.

If you work on the JVM, you have to spend more time thinking about your tools, but if you work hard to keep up do you avoid abrupt changes. Not having lived through it, I can't say, but I sure hope so!

Wednesday, February 15, 2012

Project Euler and Refactoring

I started doing the problems on Project Euler the other day. For anyone that doesn't know, Project Euler is a website that has a collection of math problems for you to solve. You can solve the problem any way you want. The site keeps track of which problems you have submitted correct answers to.

Just finding the right answer is a good programming challenge, but refactoring makes for much deeper learning.

Another benefit from doing the problems is that you are building samples of your work that you can show to prospective employers. If you are going to show the world your code, you want it to be clean.

Here is an example of my process solving one of the problems in Clojure. I can't show what I have learned without displaying my ignorance, but I am less ignorant now because of this process.

Every line of code you see here represents at least 5 lines of experimentation preceding it.

Problem 3 asks, "What is the largest prime factor of the number 600851475143 ?"

Solution 1

Find all of the divisors of n, filter the ones that are prime and then take the largest. To find all of the divisors, iterate through all of the numbers from 1 to n, and test remainder of n divided by i.

Works just fine with 42. Doesn't show any signs of returning with 600851475143 . Restart the REPL.

  • Learned a bit more about 'for' like the difference between :while and :when.

Solution 2

Create a function that finds a single factor, called afactor. In Clojure you don't use loops with breaks, you use sequences and take the items you want. Lazy sequences don't calculate numbers until you ask for them, so you can create an expression which calculates all of the divisors, but only actually calculate one or two.

(defn afactor [base]
  (nth (for [x (range 1 (+ 1 base))
             :when (= 0 (rem base x))] x ) 1))

  • Learned how to use 'for' in a lazy way.

Now to create a function that returns a pair of factors.

(defn factor [base]
  (let [first (afactor base) ]
    (list first (/ base first))))

  • Got more confidence with 'let' binding.

The next step is a test whether a number is prime. A prime number has 2 divisors, but counting the number of divisors of our number is going to leave us restarting the REPL again, which of course is just a menu item in Clooj, but it is painful to admit that I was dumb enough to think that would work, so we will skip that step. Since a prime number is only divisible by 1 and itself, its 2nd largest factor will be itself.

(defn prime? [base]
  (= base (afactor base) ))

  • I assumed (= base (afactor base)) would allow me to return a boolean, glad to see that it actually does.

The first call to factor with our number yields a result large enough that I don't care to retype it. Let's set up a manual recursion of 1 level.

(defn prob3 [base]
  (map factor 
       (filter #(complement prime?
                (factor base)))))

  • Playing around with this function and others like it really got me confident with the #(%) syntax of anonymous functions. Not long ago they were a mystery, but with repetition they are comfortable.
  • Playing with the ugly (map factor (filter ... construction was good practice with maps and lists even if the resulting code is terrible.

It is not hard to see a way to the right answer from this. Either you take the output of prob3 and call prob3 again, typing in the first result, or you could nest the map factor filter inside of another map factor filter. This isn't anything you are going to show off to your friends.

Solution 3

Solving a problem earlier on 4clojure I found out that Clojure has a function called tree-seq. We always drew trees when we were factoring numbers in school, so that will probably yield a more satisfying result. Tree-seq takes 3 parameters, branch? children and root. branch? is a function that tests whether a node has children, children is a function that returns the children for a node and root is the initial value.

(defn factor-tree [base]
  (tree-seq (complement prime?) factor base))

  • I got to learn how to use tree-seq. Also, prime? was returning true when a node didn't have children, exactly opposite the desired result. After (not prime?) failed, I understand when to use not and when to use complement.

factor-tree is returning a list of all of the factors. That is easy enough to handle. Getting all of the prime factors we may want to do on another problem, so lets give it its own function.

(defn prime-factors [base] 
    (filter #(prime? %)  (factor-tree base)  ))

For this problem, we just want the largest prime factor.

(defn problem3 []
  (apply max (prime-factors 600851475143)))       

  • I was surprised that I had to type (apply max(...)) instead of just (max (..)). In the imperative world so often you are calling methods on parameters, so it is not hard to think of LISP syntax differently. Every language I have ever used though has had a function called max that operates on a list of data, which is why it is hard for my brain to see that max(list) isn't correct.

Solution 4

Using the tree-seq makes for a much nicer solution, and I was tempted to stop there. One thing still bothered me though. In one video I saw, Stuart Halloway said that with lazy evaluation, there seldom was a reason to return just one of anything. It is much more flexible to return a sequence and let the caller decide which elements they need.

(defn afactor [base]
  (nth (for [x (range 1 (+ 1 base))
             :when (= 0 (rem base x))] x ) 1))

Needs to be rewritten.

(defn lazy-factor [base]
  (for [x (range 1 (+ base 1))
     :when (= 0 (rem base x))]
     x ))
 

  • The idea of returning a lazy sequence instead of a single value is a different philosophy that I have used in the past.

As you can see, the change was very minor. In fact, it is easier to do it the right way than the wrong way. If I hadn't rewritten it, I never would have found that out!

Because I am now returning a lazy sequence instead of a single element, I do have to modify the callers to choose the element they want.

(defn prime? [base]
  (= base (nth (lazy-factor base) 1)))

(defn factor [base]
  (let [first (nth (lazy-factor base) 1) ]
    (list first (/ base first))))

Solution 5

I thought I was done, but moving when I moved on to other problems, I kept finding myself typing things like (range 1 (+ 1 base)), as I did in the lazy-factor function. Range is built to give you a c style loop, so (range 5) gives you (0 1 2 3 4). Usually this is fine, but for the Project Euler problems, I constantly find myself wanting a basic style loop, (1 2 3 4 5) instead. This was enough motivation for me to attempt my first macro.

(defmacro rangeb [last]
  (list 'range 1 (list '+ 1 last)))

With this change, lazy-factor becomes:

(defn lazy-factor [base]
  (for [x (rangeb base)
     :when (= 0 (rem base x))] 
     x ))

Conclusion

I expect that in the not too distant future I will be able to look back at this post and cringe. I am sure that I will find other situations where I find a simpler way to do things and will be tempted to refactor again.

I find myself using 'for' quite a bit, though only in the factor function (afactor and then its replacement lazy-factor) here. for feels imperative, and I wonder if I am overusing it.

My rangeb macro is overly specific. Right now you can only pass an ending number; it always starts at one and increments by one. When the spirit moves me, or when a situation demands it, I will update it. In the meantime I will pretend that YAGNI applies here.

One other thing I think my subconscious is learning is how easy it is to write and test things in Clojure. Next time I use C# I am sure I am going to miss the REPL. Even now, I can ponder having to write methods to call and output my code, compiling every change. /shudder

Saturday, February 11, 2012

What is functional programming?

One assertion made a lot is that functional programming is an important concept for handling concurrency. So what is functional programming, and how is it applicable to concurrent programming?

While there are a lot of topics can be included in a discussion of functional programming, there are two key ideas: Higher order functions and pure functions.

Higher Order Functions

Higher order functions are functions that are meant to be applied to other functions. They can also be thought of as functions as data. The idea is that functions can be passed as parameters to a function or a function can return a function.

A common example of passing a function as a parameter is passing a function to filter data. Using Linq in C# you may write something like:

    var GeorgiaCustomers = Customers.Where (c => c.StateCode == "GA");

The section in parenthesis is called a lambda expression. Read aloud it is "C goes to c dot statecode equals GA". The lambda expression is a function that we are passing as a parameter to the Where method of the Customers collection. The function StateCode == "GA" is applied to each item in the Customers collection; if the function returns true, that gets added to the new GeorgiaCustomers collection.

The Where method takes a single parameter, which is a function that takes a customer and returns a boolean (whether the StateCode matches).

Pure Functions

Pure functions are functions that return a value without having any side effects. Side effects are changes in things outside of a function, such as printing text to the screen, or setting a property or updating a database.

Here is an example in C#.

imperative:

     public static void doubler(int input) 
     { 
          Console.WriteLine(" {0} doubled = {1}.", input, input * 2) 
     } 

     public static void main(string[] args)
     {
          doubler(2);
     }

functional:

     public static int doubler(int input) 
     { 
           return input * 2 ; 
     }

     public static void main(string[] args)
     {
           int result = doubler(2);
           Console.WriteLine(" {0} doubled = {1}.", 2, doubler(2))   
     }

While this is a silly example, the difference is clear. In the imperative example we are calling a double and print method, and in the functional example we are calling a double method and printing the result. Which is more reusable? Which is easier to test?

Why is this important

If you want to start executing programs on multiple processors you need to break the work into pieces that can be run independently. Consider a function that will return the sum of the squares of a range of numbers. In C#, you might write something like:

int SumSquares(int maxInput)
{
     int accumulator = 0;
     for (int x = 1; x <= maxInput; x++)
     {
          accumulator += x * x;
     }

     return accumulator;
}

This code is written to be executed sequentially. You have single thread of execution which accumulates a value and increments a counter each time through a loop.

In F# it would be more natural to write the program like this:

let SumSquares maxInput =
    {1 .. maxInput}
    |> Seq.map(fun x -> x * x)
    |> Seq.sum

The Seq.map function is working exactly like the Where method did in our first example. It is being passed a lambda function and applying it to a list of items. It is returning a list of the results.

The |> operator takes the output of a function and sends it as an input of the next function.

First we create a list of numbers and then we pass that as an input to a function which returns a list contains the squares of an original list of numbers.

This list is then being applied to the Seq.Sum method, which is a function that returns the sum of all of the elements in a list.

The same function in Clojure could be written:

(defn SumSquares  [maxInput]
  (reduce +
       (map (fn [x] (* x x))
              (range 1 (+ maxInput 1)))))

The syntax is different, but this works the same as the F# version. Working from the inside out (and bottom to top) we create a range from 1 to the maxInput, which we input to the map function which applies the square function to each element.

The important feature of both the F# and Clojure implementations is that by passing an anonymous function and a list to a map function and then only doing the accumulation after the fact, each number in the list can be processed independently.

Neither of these implementations is actually running in parallel right now, they are just written in a way that will easily support parallelism.

Continuing on with the Clojure version, to make it parallel all we have to do is change the map function to pmap. I am also going to modify my anonymous function so that it produces a side effect to show that it does work in parallel and why side effects in parallel programs cause problems.

The new version is:

(defn SumSquares  [maxInput]
  (reduce +
  (pmap (fn [x] 
         (do
           (println x) 
           (* x x)))
    (range 1 (+ maxInput 1)))))
In this version, the anonymous function now prints out its input and then returns the square of the input. Running it twice, I got the following 2 results:

We are no longer guaranteed that our map will execute in a particular order. Adding the side effect of writing to the console just looks strange. I especially like the second execution where the program switched from the 1 thread to the 4 thread before the 1 thread wrote its own newline character, which is how we ended up with 14 on the first line and then a blank line a couple lines later.

Summary

Writing in a functional style supports breaking work up into pieces that can be run in parallel. Being able to pass functions as parameters to higher order functions allows for the creation of things like pmap which handle the dispatching of work to different threads. Writing pure functions, without side effects, leads to solutions that can be broken into the smaller independent pieces that can be passed to a map or pmap function.