Tuesday, January 31, 2012

Why C++ and where to look

The theme of this blog is finding out how programming will have to change to deal with computers containing dozens or hundreds of processors. So far most of the posts have been related to Clojure. In addition to Clojure, F# and C# also have good answers to parallelism.

The approach in all three of these languages is to write applications that can be run in parallel safely, so that you can benefit from having multiple processors without breaking anything. In all three cases, the actual hardware is abstracted away.

In addition to the number of processors in a machine though, we are also seeing a diversity of processors being used within a single machine. Graphical Processing Units are very powerful, but their vector processing gives a bigger boost to some operations than others. GPUs and other chips have reduced instruction sets, which means that some code that is perfectly legal on a CPU won't work on a GPU.

Reducing the number of clock-cycles and choosing lower power processors are important whether you are trying to increase the battery life of a mobile device or having to power (and cool) a server farm. Cloud computing, which provide CPUs as a metered service, also encourages using cycles sparingly.

If you want to be optimizing cycles and choosing chips you need to be running "on the metal" and that means C or C++.

There have been a lot of changes to C++ in the C++ 11 specification. There are also new libraries that deal specifically with the problems of developing on different types of chips. When even Microsoft, with their traditional focus on drag and drop development and managed code, starts talking about a C++ renaissance you know it is a big deal.

Microsoft is hosting a 2 day "Going Native" event on February 2nd and 3rd. The event is being streamed live (for free) at http://channel9.msdn.com/Events/GoingNative/GoingNative-2012. I am sure they will also have videos of the keynotes and sessions available later on the channel 9 site.

I will not watch it live (I am still trying to learn Clojure), but it is timely information, so I thought I would pass it along.

Wednesday, January 25, 2012

Setting up clojure emacs on windows

Setting up a Clojure development environment proved to be quite a challenge, especially given my handicap of using Windows.

The most widely used editor for Clojure is Emacs. I don't have any idea how to use it, but I have seen video of people who know what they are doing, and I really want that!

There are lots of options for setting up Clojure. If you are just starting out, there are too many. There are at least half a dozen editors that have Clojure plug-ins. Emacs itself has two. The instructions available also tend to present different ways of accomplishing the same thing.

I am not going to give you lots of options. I am going to give you the steps to go from a clean Windows 7 install to having the most popular plug-in for the most popular editor using the most popular build tool for Clojure. I am sure there is a lot of value to all of the flexibility that other people support, but it is lost on me and if you are reading this, it is probably lost on you too.

Also, I intend to cover each step in excruciating detail. I know from experience that any instructions on this subject, no matter how detailed, would include at least one or two steps that assumed I actually knew how to use Emacs. If you follow these instructions and there is a step that is not completely clear to you, please put it in the comments so that I can fix it. If the detail is too much for you, just read the bold print!

Our goal is to have a working installation of the Leiningen build tool, and the SLIME plug-in for the Emacs editor.

Before I start, let me say this works best if you do not already have clojure installed on your computer, or more specifically that you do not have the clojure.jar file in your class path. The Leiningen install will fail if you do.

Install the Java Development Kit (JDK )


Version 1.6 or better is required, I am running 1.7 Standard Edition. Download this from Oracle. There is also a link to installation instructions on that page.

After jdk is installed you need to make sure that the environment variables are set up correctly. (Control Panel >> System and Security >> System >> Advanced system settings >> Environment Variables...)

Under user variables I have JAVA_HOME with a path to the folder where the jdk is installed. On my system this is C:\Program Files\Java\jdk1.7.0_01

Under System Variables, the java bin directory needs to be in your path. click edit on the path, so you can scroll through and look to see if these settings are already there, and if not add them (path items are separated with ; on windows) %Java_Home%\bin\ and the physical path to the bin directory that holds the jdk executable. In my case this is C:\Program Files\Java\jdk1.7.0_01\bin\

Test the java installation by going to a command prompt (click the windows button in your taskbar and type cmd in the search box) at the command prompt type:
javac -version
if the computer responds with a version number the jdk is set up properly, if it says that javac is not found, you have a problem you need to fix before you move on.

Install Leiningen

(build tool)
Create a folder that will hold the leiningen batch file and also a helper utility it needs. Mine is in c:\lein. to create it I went to the command prompt, typed
cd \
md lein
After that type exit to close that command prompt window. Add the path to your new folder to your path setting.

Download curl from
http://www.paehl.com/open_source/?download=curl_723_1_ssl.zip
and place curl.exe and into your new folder.

download libssl from
http://www.paehl.com/open_source/?download=libssl.zip
and place libeay32.dll and ssleay32.dll into your new folder.

Download the lein.bat file from
https://raw.github.com/technomancy/leiningen/stable/bin/lein.bat
and put it in to your new folder. (My browser displayed the text, so to save it I did file >> save page as and then navigated to my c:\lein folder.

Open a new command prompt and type
lein self-install

After the installation completes you can test your leiningen install by typing
lein repl
at the command prompt. It should launch the clojure repl which you can test by typing
(+ 1 2)
at the user prompt.

click the x in the upper right to close the command prompt.

Install Emacs

(Editor)
Download emacs from
http://ftp.gnu.org/pub/gnu/emacs/windows/emacs-23.1-bin-i386.zip
extract the emacs-23.1 folder, and put it somewhere. I just put mine in c:\

Create a folder to hold plugins for emacs named .emacs.d I put mine in my emacs-23.1 folder.

Create a new user environment variable called HOME in the value put the path to the .emacs.d folder. in my case this is C:\emacs-23.1

Add the path to the emacs.exe folder to your path. mine was C:\emacs-23.1\bin

Install Clojure mode

Create a file called init.el in your .emacs.d folder and enter this text
(add-to-list 'load-path "~/.emacs.d/")
(require 'clojure-mode)
Add the file clojure-mode.el from
https://github.com/technomancy/clojure-mode/blob/master/clojure-mode.el
to your .emacs.d directory. I found the easiest way to do this was to copy the text in the code window at that url and paste it into a new text file that I called clojure-mode.el. If you just download the webpage, you will get lots of html commands that will cause errors in emacs.

Install Swank plugin

open a new command prompt and type
lein plugin install swank-clojure 1.3.4
-- note initially this install got hung for me, when I disabled avg link scanner it worked right away.

Create a new Clojure project from the command prompt, navigate to a folder where you would like to create clojure files. from c:\Users\Rick I typed
md projects
then
cd projects create the new project by typing
lein new testproj
then type
cd testproj
emacs

after emacs loads type
alt-x clojure-jack-in
emacs will spend a couple of moments processing the plugin. After this, you should have a running REPL that you can test the same way that you tested the repl from leiningen. type
(+ 1 2).
If you get 3, it works.

Monday, January 23, 2012

Clojure is one answer

My last post was a link to a video talking about the challenges of many-core computing. Today I am linking to another video from Channel 9. This one is a discussion with Rich Hickey about Clojure. The topics build on one another: introducing Clojure, why Clojure is a lisp, functional programming, lists and vectors, persistent data structures, identities and concurrent programming. I recommend the whole video, but if you just want to jump to the section on concurrency that starts at 37:15.

edit: For people who didn't like the embedded player, here is the link to the video on msdn.

Thursday, January 12, 2012

Laying out the problem

I recently saw a video on Channel 9 that provides a great background on parallel programming.

The amount of information available to developers from Channel 9 is incredible. Naturally there is good coverage of Microsoft technologies, but there is also a lot of material that has nothing to do with Microsoft.

The video is an interview with Burton Smith. I recommend it to anyone.

If you don't want to watch the interview, or the whole interview, here is how I would break down the talk:

(0:30) Hardware for mainstream computing is changing in a way that makes ideas that were once only relevant to supercomputers important to the mainstream.

(1:52) Multi-core refers to 8 or fewer processors, multi-core refers to either more than 8 processors, or utilizing different types of processors such as CPUs and Graphics Processing Units.

(3:08) In the early days super computers were powerful machines used for general purpose, transactional processing tasks.

(4:05) The introduction of vector processing caused supercomputing to become much more specialized. Programs that had a lot of branching could not benefit from vector processing. Tasks that involved performing the same operation across a set of data could be written to use Instruction Level Parallelism were able to get big gains from vector processing.

(5:37) After vector processing, the next big change was distributing computations among many smaller computers. This is the state of super computing today. Distributed parallel computing works by different machines sending messages to each other, though within each machine itself there may not be any parallel processing going on. In practice this makes supercomputing even more specialized than vector processing did.

(7:17) Now that mainstream computers are being built with multiple cores, the goal is to make parallel processing very general.

(8:56) In the 80s people realized that the physical limits of a single core would be reached. Their research didn't lead to success in the marketplace, but we can learn today from the knowledge they gained.

(10:00) It is not only computer architecture that you worry about. One of the major problems is programming languages.

(10:45) A decision made at the foundation of computing that is inhibiting attempts at parallelization is the idea of a variable.

(11:32) In a computer that executes one instruction at a time, you know the order in which the instructions will be done.

(14:28) Smith gives an analogy of parallel computing by describing a development team and talking about the implication of an office full of programmers working together to write a piece of software. He describes the risks of mutability and the strengths of messaging and immutability.

(16:25) Tweaking C++ a little bit isn't sufficient for solving parallel problems. The transformation needed is much more radical and will change programming styles for ever.

(20:33) The dominant programming languages today require the reading and writing of shared data. The programs are challenging and difficult to maintain as a result.

(21:00) Software Transactional Memory is one of the alternatives being pursued today.

(21:46) SQL is a functional programming language that a lot of people are able to use and understand. When people wish to modify data they use transactions.

(22:46) Excel is a functional programming environment. Every cell contains either a value or an expression that is a function of other values.

(26:28) Even in a parallel functional program, when state is modified it is still done sequentially. The difference is the imperative state changes become isolated pieces of a program.

(30:54) Databases are an example of a persistent store that has to be accessible in parallel, with changes done in isolation.

(32:43) Databases combine transactions with functional programming. That is the most successful parallel programming in the world.

(33:03) "There are only 3 ways to do parallel computing: 1. Get an expert. 2. Use functional Language. 3. Use transactions."

(33:40) Changes have to be done in isolation, but there are many ways to get the isolation in either space or time.

(33:20) Based on our experience with databases, transactional memory should be able to scale very well, but just adding to C++ probably won't be enough.

-- After that is a long discussion about different levels of programming and making parallelism more accessible. It is a fascinating discussion, but tougher to tease out parts for this blog.

Tuesday, January 10, 2012

Something to look forward to

One of the problems that developers have to learn to address is concurrency. I expect this to be a recurring theme on this blog. From what I have seen Clojure is the language best suited to dealing with concurrency.

Rather than talking about the different problems of multi-threaded programming and how they are addressed with Clojure, as if it is some necessary evil, here is a great presentation about Clojure's strengths before even considering the issue of concurrency.

This is from Stuart Halloway, author of Programming Clojure. This was filmed at Øredev 2009.

It appears Vimeo doesn't want me to embed it, so here is the link:

Clojure - Stuart Halloway

Saturday, January 7, 2012

Why?

Every blog must have a first post. The usual model is to lay out some grand ambitions and ideals to inspire embarrassment or regret in the author later. In lieu of that, I will admit that this is my third attempt at blogging.

The first blog was about .NET programming that I started when I was playing with WinFX and Indigo, which became Windows Communication Foundation in .NET 3.0. I think I wrote 4 posts. I don't even remember the name of that blog now.

The second was an attempt at non-fiction prose. I wrote a bit more there, but no where near as much as I had hoped. That was a grand vision that didn't pan out.

So why a new blog? Because I am more excited about programming than I have been since at least the WinFX beta, possibly even more than when I first switched to .NET in 2002, and maybe since ever.

Computer platforms are mobile, multi-core and distributed. But sometimes developers are so focused on the problems they face day to day that they don't have a chance to see how the world has changed around them.

Think of this blog as the diary of Rip Van Programmer. I have recently woken up and now I have to learn to live in this strange modern world.

This blog is about the process of my learning. I hope that it will contain useful things for others, but I expect to make mistakes along the way. If a year from now I can't look back and find mistakes in this blog it will have been a failure.

If you see mistakes or better ways to do things, please tell me!