First round of Industrial Haskell Group development work

April 28th, 2009 by duncan

The Industrial Haskell Group (IHG) have asked us to get cracking on a number of tasks:

  • Make dynamic/shared libraries work better
  • Make it possible to build GHC without using GMP
  • FFI checker/lint tool
  • Improving hsc2hs/c2hs + Cabal to make it easier to write C wrappers to C functions

We’ll talk in more detail about each one as we tackle them.

Shared libraries

We’ve started on the shared libraries task. This is quite a big area. Lots of people have put a lot of hard work into it already but there’s a fair bit left to do before we have GHC releases using them by default.

A little history

Wolfgang Thaller did a lot of the original work on generating position independent code (PIC) in the native codegen. Clemens Fruhwirth pushed things further along as part of a SoC project. He got shared libs working on Linux and started to address some of the packaging and management issues. GHC version 6.10 actually released with the shared libs code as an experimental feature.

Why do we care about shared libs?

There are several reasons we care. The greatest advantage is that it enables us to make plugins for other programs. There are loads of examples of this, think of plugins for things like vim, gimp, postgres, apache. On Windows if you want to make a COM or .NET component then it usually has to be as a shared library (a .dll file).

There has been most demand for this feature from Windows users over the years and for some time it has been possible to generate .dlls using GHC (though it was broken in version 6.10.1). It’s not been an easy feature to use however, and what’s more the current results are not exactly great. While you can currently take a bunch of Haskell modules that export a C API and make a .dll, the .dll file you get is huge. It statically links in the runtime system and all the other Haskell packages. So if you want to use more than one dll plugin then each one has it’s own copy of the GHC runtime system and all the libraries! Obviously this is not ideal. Having all these copies of the runtime system and base libs takes more memory, more disk space and slows things down. What everyone really wants is to be able to build the runtime system and each Haskell package as a separate .dll file. Then each plugin should be small and would share the runtime system and other dependencies that they have in common.

A somewhat superficial reason is that it makes your “Hello World” program much smaller because it doesn’t have to include a complete copy of the runtime system and half of the base library. It’s true that in most circumstances disk space is cheap, but if you’ve got some corporate shared storage that’s replicated and meticulously backed-up and if each of your 100 “small” Haskell plugins is actually 10MB big, then the disk space does not look quite so cheap.

Using shared libraries also makes things a bit easier for Haskell applications that want to do dynamic code loading. For example GHCi itself currently has to load two copies of the base package, the one that is statically linked with and another copy that it loads dynamically. With shared libraries it would just end up with another reference to the same copy of the single shared base library.

Shared libs also completely eliminates the need for the “split objs” hack that GHC uses to reduce the size of statically linked programs. This should make our link times a bit quicker.

What we’ll be doing

We’re planning to get things to the stage where a GHC user can make a working plugin on Linux x86, Linux x86-64 and Windows.

As recently as a few days ago people have managed to get GHC HEAD working with shared libraries on Linux x86-64. Since then however we’ve had the new GHC build system land in the HEAD branch. So the first thing I’ve been working on is porting the shared library support to the new build system. So far so good. I’ll report when I’ve got the build to go all the way through.

Platform progress and the Hackathon

April 24th, 2009 by duncan

The Haskell Hackathon last weekend was a great success with more than 50 people attending over the three days. Thanks to the sponsors and local organisers!

If you’ve been to a few of these events you learn that it’s best not to come with too many preconceived ideas for what to work on. Since the point of the hackathon is really collaboration, you end up spending half the time talking and the other half working on cool ideas that other people bring.

I arrived with the general plan to work on the Haskell Platform release, and along with Don Stewart and Lennart Kolmodin we did actually get a bit done. I’m slightly embarrassed to admit that I spent three days at the Haskell Hackathon and wrote no Haskell code, only POSIX shell script and M4 autoconf macros!

Don and I updated the list of packages that will be in the first platform release. There were a few that needed to be bumped after the ghc-6.10.2 release. Our thanks to Ross who had already uploaded all the core and “extra libs” packages to Hackage.

The three of us also worked on making a generic Unix tarball of the platform. The point is for users of distros which do not yet have native packages for the platform to be able to download this tarball and ./configure; make; make install. We even managed to get something working just enough for people to be able to test it (haskell-platform-2009.0.0.tar.gz).

Chris Eidhof and Eelco Lempsink of Tupil designed a cool “Get Haskell” download page

(The silly caption was Chris’s joke in response to Ganesh’s comment about an earlier design)
The idea is that we would put this at http://haskell.org/download/ to provide an easy start for new users. For OSX and Windows, the icons would link directly to a download and a page with install and post-install instructions. The Linux icon would link to another page with instructions for each supported distro, or the generic tarball for unsupported distros.

Outside of the Hackathon people have also been working hard on the platform release. If you’re on the mailing list you’ll know that Mikhail Glushenkov has been making great progress on preparing a Windows installer. He’s got a beta version available (HaskellPlatform-2009.0.0-setup.exe). Report feedback in the platform trac ticket #6.

Gregory Collins has also been working hard on a cabal2macpkg tool to generate OSX packages from Cabal packages. He’ll use this for each package in the platform and then bundle them all together (along with ghc) into one installer. He’s been having difficulty with the fact that the package format for OSX Leopard is woefully under-documented.

If you’re someone who prepares distro packages then now is an excellent time to get started making sure you’ve got the correct versions of all the platform packages and making a haskell-platform meta-package. See the platform trac for more details.

Regression testing with Hackage

March 21st, 2009 by duncan

Suppose you wanted to do something rash like release a new version of some important piece of infrastructure like Cabal, haddock or indeed ghc itself. Of course you worry that your sparkling new release might have hidden regressions. If only you could check that you’re not breaking anyone’s code. Well, you can!

We can use the cabal command line tool to do regression testing. Basically we build all of Hackage with the old and new releases and then we compare the build reports to find regressions. Simple!

Let’s look at the details…
Read the rest of this entry »

The Industrial Haskell Group

March 2nd, 2009 by duncan

IHG logo

We are pleased to announce the creation of the Industrial Haskell Group (IHG). The IHG is an organisation to support the needs of commercial users of the Haskell programming language.

For more information, please see
http://industry.haskell.org/

Currently, the main activity of the IHG is a collaborative development scheme, in which multiple companies fund work on the Haskell development platform to their mutual benefit. The scheme has started with three partners of the IHG, including Galois and Amgen.

More details are available at
http://industry.haskell.org/collab

If your company is interested in joining then please e-mail info@industry.haskell.org

Cabal ticket #500

February 14th, 2009 by duncan

I just opened the 500th Cabal ticket! What, you mean there’s no prize?

I’ll ignore the possibility that this is a sign that Cabal is full of bugs and take the positive view that 500 tickets is a sign of an active, useful project. Nobody bothers complaining about useless projects.

As it happens, the 500th ticket is not a bug but an idea for a project. There are over 1,000 packages on Hackage now and the question is how many of them can be installed simultaneously? This is not just an idle statistic. If two packages cannot be installed with consistent dependencies then it is unlikely that you can use both together in your next project. That is of course a wasted opportunity for code re-use.

The idea is if we can work out the set of packages on Hackage that can all be installed together consistently then we can mark package pages with that information. Basically we would be handing out brownie points. Hopefully we can also influence maintainers of other packages to adjust their dependencies so that their package can also join the happy collection of packages that can all agree on their dependencies.

Actually calculating the maximal set of consistent packages is a bit tricky. It is almost certainly NP-complete in general but in practice is probably doable and we can probably live with approximations.

In fact it might make quite a good Haskell.org Google Summer of Code project. If you are interested, get in touch.

GHC 6.10.1 released!

November 4th, 2008 by ian

After months of work, GHC 6.10.1 is finally released!

The headlines include a parallel garbage collector, extensible exceptions, haddock 2, Data Parallel Haskell as an extralib, and much more besides.

Well-Typed is pleased to have been able to play its part in making this release happen, with Ian working with the Simons at GHC HQ and many other developers across the world to bring this release together.

Meanwhile, Duncan visitied Galois to team up with Don Stewart for some extensive pre-release testing. Their experiments with building all of the packages on Hackage give us confidence that this will be a solid release.

Thanks to everyone who played a part, be it development, testing or otherwise, in this release. We couldn’t have done it without your help!

Haskell Platform talk at the London Haskell Users Group

November 4th, 2008 by duncan

I’m talking about the Haskell Platform at the London Haskell Users Group this Thursday.

It is an extended version of the 10-minute talk Haskell: Batteries Included that Don Stewart and I presented at the recent Haskell Symposium.

Abstract:

Some people pick a programming language because it has the best type system, the best facilities for abstraction or perhaps the fastest compiler. Most people however pick a whole programming platform that lets them solve their problem the quickest or best. A programming platform consists of a language, a compiler, and a set of standard libraries and tools. Other popular programming languages have a standard platform that puts together everything you need to get started.

This talk is about the Haskell Platform. We’ll cover what the Haskell Platform is and who it is for. We’ll also look at the technical infrastructure and the social aspects of how it will be managed.

Not all of the details are set in stone. We need to have a discussion within the Haskell community about how the platform will be managed and extended, especially since it needs buy-in from package maintainers. My hope is to use this talk to kick off that discussion.

zlib and bzlib package updates

November 2nd, 2008 by duncan

I’m pleased to announce updates to the zlib and bzlib packages.

The releases are on Hackage:

What’s new

What’s new in these releases is that the extended API is slightly nicer. The simple API that most packages use is unchanged.

In particular, these functions have different types:

compressWith   :: CompressParams
               -> ByteString -> ByteString
decompressWith :: DecompressParams
               -> ByteString -> ByteString

The CompressParams and DecompressParams types are records of compression/decompression parameters. The functions are used like so:

compressWith   defaultCompressParams { ... }
decompressWith defaultDecompressParams { ... }

There is also a new parameter to control the size of the first output buffer. This lets applications save memory when they happen to have a good estimate of the output size (some apps like darcs know this exactly). By getting a good estimate and (de)compressing into a single-chunk lazy bytestring this lets apps convert to a strict bytestring with no extra copying cost.

Future directions

The simple API is very unlikely to change.

The current error handling for decompression is not ideal. It just throws exceptions for failures like bad format or unexpected end of stream. This is a tricky area because error streaming behaviour does not mix easily with error handling.

On option which I use in the iconv library is to have a data type describe the real error conditions, something like:

data DataStream
   = Chunk Strict.ByteString Checksum DataStream
   | Error Error -- for some suitable error type
   | End Checksum

With suitable fold functions and functions to convert to a lazy ByteString. Then people who care about error handling and streaming behaviour can use that type directly. For example it should be trivial to convert to an iterator style.

People have also asked for a continuation style api to give more control over dynamic behaviour like flushing the compression state (eg in a http server). Unfortunately this does not look easy. The zlib state is mutable and while this can be hidden in a lazy list, it cannot be hidden if we provide access to intermediate continuations. That is because those continuations can be re-run whereas a lazy list evaluates each element at most once (and with suitable internal locking this is even true for SMP).

Background

The zlib and bzlib packages provide functions for compression and decompression in the gzip, zlib and bzip2 formats. Both provide pure functions on streams of data represented by lazy ByteStrings:

compress, decompress :: ByteString -> ByteString

This makes it easy to use either in memory or with disk or network IO.
For example a simple gzip compression program is just:

import qualified Data.ByteString.Lazy as BS
import qualified Codec.Compression.GZip as GZip

main = BS.interact GZip.compress

Or you could lazily read in and decompress .gz file using:

content <- GZip.decompress <$> BS.readFile file

General

Both packages are bindings to the corresponding C libs, so they depend on those external C libraries (except on Windows where we build a bundled copy of the C lib source code). The compression speed is as you would expect since it’s the C lib that is doing all the work.

The zlib package is used in cabal-install to work with .tar.gz files. So it has actually been tested on Windows. It works with all versions of ghc since 6.4 (though it requires Cabal-1.2).

The darcs repos for the development versions live on code.haskell.org:

I’m very happy to get feedback on the API, the documentation or of course any bug reports.

Some ideas for the Future of Cabal

October 7th, 2008 by duncan

I presented a “Tech Talk” today at Galois on some ideas relating to Cabal

We discussed two topics. Here’s the abstract:

A language for build systems

Build systems are easy to start but hard to get right. We’ll take the view of a language designer and look at where our current tools fall down in terms of safety/correctness and expressiveness.

We’ll then consider some very early ideas about what a build system language should look like and what properties it should have. Currently this takes the form of a design for a build DSL embedded in Haskell.

Constraint solving problems in package deployment

We are all familiar, at least peripherally, with package systems. Every Linux distribution has a notion of packages and most have high level tools to automate the installation of packages and all their dependencies. What is not immediately obvious is that the problem of resolving a consistent set of dependencies is hard, indeed it is NP-complete. It is possible to encode 3-SAT or Sudoku as a query on a specially crafted package repository.

We will look at this problem in a bit more detail and ask if the right approach might be to apply our knowledge about constraint solving rather than the current ad-hoc solvers that most real systems use. My hope is to provoke a discussion about the problem.

I’ve covered similar ground to the first topic before in a previous blog post on make. This time we looked in a bit more detail at what a solution might look like and what properties it should have. In particular we discussed what the correctness properties of a build system might look like.

Slides from the Haskell Platform talk

September 26th, 2008 by duncan

I promised to post the slides from our talk on the Haskell Platform which Don and I presented at the Haskell Symposium yesterday.

Haskell: Batteries Included

Malcolm did us all a great service by videoing the talks. Unfortunately he had to catch his flight home before our talk so there is no video for that one.

Don did the talk with the slides and I did the live demo. Fortunately the demo worked. We ran the new hackage server on my laptop and we invited people to connect. I demoed uploading a new package and within a few seconds people in the audience were able to download and install it using cabal-install. That of course is old hat to the open source Haskell hackers but part of what we were trying to do is to persuade the academics to make better use of Hackage to publish libraries and tools that they develop as part of their research.

One new thing we demoed was generating build reports and uploading them back to the hackage server. In fact we had several people in the audience upload report for the new package within 30 seconds of me uploading it. The build reporting is part of the plan for testing the packages in the Haskell Platform but more generally to gather information on what packages build in what environments.