Saturday, February 11, 2012

What is functional programming?

One assertion made a lot is that functional programming is an important concept for handling concurrency. So what is functional programming, and how is it applicable to concurrent programming?

While there are a lot of topics can be included in a discussion of functional programming, there are two key ideas: Higher order functions and pure functions.

Higher Order Functions

Higher order functions are functions that are meant to be applied to other functions. They can also be thought of as functions as data. The idea is that functions can be passed as parameters to a function or a function can return a function.

A common example of passing a function as a parameter is passing a function to filter data. Using Linq in C# you may write something like:

    var GeorgiaCustomers = Customers.Where (c => c.StateCode == "GA");

The section in parenthesis is called a lambda expression. Read aloud it is "C goes to c dot statecode equals GA". The lambda expression is a function that we are passing as a parameter to the Where method of the Customers collection. The function StateCode == "GA" is applied to each item in the Customers collection; if the function returns true, that gets added to the new GeorgiaCustomers collection.

The Where method takes a single parameter, which is a function that takes a customer and returns a boolean (whether the StateCode matches).

Pure Functions

Pure functions are functions that return a value without having any side effects. Side effects are changes in things outside of a function, such as printing text to the screen, or setting a property or updating a database.

Here is an example in C#.

imperative:

     public static void doubler(int input) 
     { 
          Console.WriteLine(" {0} doubled = {1}.", input, input * 2) 
     } 

     public static void main(string[] args)
     {
          doubler(2);
     }

functional:

     public static int doubler(int input) 
     { 
           return input * 2 ; 
     }

     public static void main(string[] args)
     {
           int result = doubler(2);
           Console.WriteLine(" {0} doubled = {1}.", 2, doubler(2))   
     }

While this is a silly example, the difference is clear. In the imperative example we are calling a double and print method, and in the functional example we are calling a double method and printing the result. Which is more reusable? Which is easier to test?

Why is this important

If you want to start executing programs on multiple processors you need to break the work into pieces that can be run independently. Consider a function that will return the sum of the squares of a range of numbers. In C#, you might write something like:

int SumSquares(int maxInput)
{
     int accumulator = 0;
     for (int x = 1; x <= maxInput; x++)
     {
          accumulator += x * x;
     }

     return accumulator;
}

This code is written to be executed sequentially. You have single thread of execution which accumulates a value and increments a counter each time through a loop.

In F# it would be more natural to write the program like this:

let SumSquares maxInput =
    {1 .. maxInput}
    |> Seq.map(fun x -> x * x)
    |> Seq.sum

The Seq.map function is working exactly like the Where method did in our first example. It is being passed a lambda function and applying it to a list of items. It is returning a list of the results.

The |> operator takes the output of a function and sends it as an input of the next function.

First we create a list of numbers and then we pass that as an input to a function which returns a list contains the squares of an original list of numbers.

This list is then being applied to the Seq.Sum method, which is a function that returns the sum of all of the elements in a list.

The same function in Clojure could be written:

(defn SumSquares  [maxInput]
  (reduce +
       (map (fn [x] (* x x))
              (range 1 (+ maxInput 1)))))

The syntax is different, but this works the same as the F# version. Working from the inside out (and bottom to top) we create a range from 1 to the maxInput, which we input to the map function which applies the square function to each element.

The important feature of both the F# and Clojure implementations is that by passing an anonymous function and a list to a map function and then only doing the accumulation after the fact, each number in the list can be processed independently.

Neither of these implementations is actually running in parallel right now, they are just written in a way that will easily support parallelism.

Continuing on with the Clojure version, to make it parallel all we have to do is change the map function to pmap. I am also going to modify my anonymous function so that it produces a side effect to show that it does work in parallel and why side effects in parallel programs cause problems.

The new version is:

(defn SumSquares  [maxInput]
  (reduce +
  (pmap (fn [x] 
         (do
           (println x) 
           (* x x)))
    (range 1 (+ maxInput 1)))))
In this version, the anonymous function now prints out its input and then returns the square of the input. Running it twice, I got the following 2 results:

We are no longer guaranteed that our map will execute in a particular order. Adding the side effect of writing to the console just looks strange. I especially like the second execution where the program switched from the 1 thread to the 4 thread before the 1 thread wrote its own newline character, which is how we ended up with 14 on the first line and then a blank line a couple lines later.

Summary

Writing in a functional style supports breaking work up into pieces that can be run in parallel. Being able to pass functions as parameters to higher order functions allows for the creation of things like pmap which handle the dispatching of work to different threads. Writing pure functions, without side effects, leads to solutions that can be broken into the smaller independent pieces that can be passed to a map or pmap function.

No comments:

Post a Comment