Archive for the ‘Industrial Haskell Group’ Category

Video and slides from the IHG talk at CUFP

Tuesday, September 8th, 2009

I gave the closing talk at the Commercial Users of Functional Programming (CUFP) conference last week about the birth of the Industrial Haskell Group.

Birth of the Industrial Haskell Group

I talked about how we share these language implementations and what opportunities there are to share future development costs. I talked about how we went about setting up the IHG. Finally, I tried to persuade people that improving shared development infrastructure such as Hackage is a modest investment with potentially large benefits.

Here’s the full abstract:

It has long been thought that commercial users of Haskell could benefit from an organisation to support their needs, and that as a side-effect the wider Haskell community could benefit from the actions of such an organisation. The stronger community would in turn benefit the commercial users, in a positive feedback cycle.

At last year’s CUFP, users of several FP languages raised the issue that there was no organisation that they could pay to do the important but boring work of maintaining and improving common infrastructure. Shortly after CUFP, in partnership with major commercial users of Haskell such as Galois and Amgen, we started to set wheels in motion, and in March 2009 we announced the birth of the Industrial Haskell Group (IHG).

The IHG is starting off with a limited set of activities, but already it is having an impact on the state of the Haskell development platform. We expect that as it expands, it will become a significant force driving Haskell forwards.

In this presentation we will talk about the motivation leading to the formation of the IHG, how it has worked thus far and what lessons we can learn that might benefit other FP communities. We will also look at how we can encourage the positive feedback cycle between commercial users and the wider community.

Industrial Haskell Group meeting at CUFP

Wednesday, August 19th, 2009

Following on from the “Birth of the Industrial Haskell Group” talk at CUFP in Edinburgh, we will be having a short meeting to discuss our future plans before everyone heads off to dinner. That’s on the 4th September, initially gathering in a suitable corner of the CUFP location.

If you’re thinking of joining the IHG, or even just interested in what is going on, then please join us.

GHC and Windows DLLs

Friday, July 3rd, 2009

Following on from Duncan’s work on Building plugins as Haskell shared libs, I’ve been working on supporting the same functionality on Windows. The end goal is to have a rts.dll, libHsBase.dll and myPlugin.dll and be able to write things like Excel plugins in Haskell without needing to statically link the whole runtime system and set of libraries into each one.

Windows uses the Portable Executable (PE) Format, so the hoops that must be jumped through are different than those for Linux and Mac OS X. Linux uses ELF for its object format, and Mac OS X uses Mach-O. Tool chain programs such as linkers and object file views are also different.

One of immediate issues is to deal with mutually recursive imports between Haskell libraries and the GHC Run Time System (RTS). Clearly, the code for a Haskell library will call the RTS to perform tasks such as allocating memory, throwing exceptions, forking threads and so on. However, the runtime system also calls back on the base library. For example, here is a function from the RTS which helps to create parallel threads:

void createSparkThread (Capability *cap) {
    StgTSO *tso;
    tso = createIOThread (cap, RtsFlags.GcFlags.initialStkSize,
                                  &base_GHCziConc_runSparks_closure);
    postEvent(cap, EVENT_CREATE_SPARK_THREAD, 0, tso->id);
    appendToRunQueue(cap,tso);
}

The variable base_GHCziConc_runSparks_closure is the name of a function closure in the GHC.Conc library which we won’t have code for when we’re linking the RTS.

One of the quirks of Windows is the need to generate so called “import libraries”. These contain stub code that is used to call a function in a DLL. For example, if code in module main.o wants to call a function fun in a library base.dll, the picture looks something like this:

## in main.o ##################### (linked into main.exe)
main:
    call fun
    ....
    call dword ptr [__imp_fun]
    …. 

## in base.lib ###################### (linked into main.exe)
fun:
    jmp dword ptr [__imp_fun]

__imp_fun:
.data
    .dword fun

## in base.dll ######################
fun:
    .. actual code for fun   

In Windows, all calls to a function in a DLL go via the Imported function Address Table (IAT). This is a table of pointers, and in the example above there is one entry named __imp_fun. There are two ways to use this table. The first way is illustrated by the first call to fun in main.o. This call targets stub code that looks up the pointer from the table and then jumps to it. The second way is to lookup the pointer and jump to it directly, but to do this we need to know that the function is in an external DLL at code generation time. A call fun instruction uses a PC relative offset, and is physically shorter than a call dword ptr [] instruction, so it’s not practical to change one to the other at link time.

The file base.lib is the “import library”, which contains the call stub and the IAT. Import libraries need to be generated independently from the main compiling and linking process, using Windows specific tools. The import library for a particular dll is then linked into every executable (or other dll) that uses it.

Anyway, I’ve spent the last few days wading through MSDN and the GHC build system, and I think I’ve cataloged at least all the major hoops. I’ll let you know how the jumping goes next post.

GHC, primops and exorcising GMP

Tuesday, June 9th, 2009

GHC uses GMP to implement the Haskell arbitrary-precision Integer type. It’s been this way for ages.

For various reasons using GMP is a slight problem for some users. Some users don’t really make use of Integer and don’t like to have to link to GMP. Since GMP uses the LGPL, if you want to ship closed source programs then you have to link to it dynamically. On Windows static linking is the default so you have to jump through hoops to link it dynamically. Then there are also users who make heavy use of GMP and find that the Integer library is far too limited an interface to GMP. However binding extra GMP functions is complicated by the the way that the GHC RTS uses it already (especially the memory management).

So what these people want is a way to build GHC such that the RTS does not directly link to GMP. Then the implementation of Integer should be in a library that is replaceable so that one can use a simple slow implementation, a super-duper binding to GMP or some other “big num” library.

Daniel Peebles, Ian Lynagh and I have been working on this problem recently. Ian and my contributions to this are supported by the IHG.

Getting GMP out of the RTS

Before we can think about replacements however we need to disentangle GMP from the RTS and at least move the existing GMP-based Integer implementation into a library. This Integer implementation would remain the default so it still has to be fast. Daniel has managed to rip GMP out of the RTS and we’re now focusing on how to move the GMP binding into its own library.

The difficulty of moving it out of the RTS is that currently almost all the GMP operations are bound as GHC “primops”, as opposed to using the FFI. This is partly historical accident (FFI arrived on the scene relatively late) and partly that due to certain FFI restrictions, the primop route is simpler and faster. The issue is that the wrapper code (around the actual GMP calls) needs to return several results to Haskell land, in particular things like (# Int, ByteArray# #). Using the FFI it is possible to return several results but one has to do it in the time-honoured tradition of C and emulate “out” parameters by passing pointers. The problem with doing that is we would need to do a lot of marshaling: temporarily allocate some memory, pass pointers and read back the results. All this just to return a few integers and pointers. It’s actually more tricky because at the level in the library stack where we have to implement Integer we do not actually have access to the FFI libraries (in fact currently we do not even have access to the IO type).

GHC primops

Primops bypass the single-result restrictions inherited from the C calling convention. We can write primops that directly return unboxed tuples, like (# Int, ByteArray# #). Primops (at least out-of-line primops) are implemented in Cmm, which is GHC’s low level intermediate language based on the C-​- language. These Cmm functions have to know exactly the internal calling convention that GHC uses, but there is no excess marshaling.

Unfortunately knowledge of the primops has to be baked into the compiler and the Cmm code has to be compiled into the RTS. So that’s no good for implementing Integer a separate library from the RTS.

What if we could use the FFI to import Cmm functions…

foreign import prim

That would make it possible to have out-of-line primops in a library. The library would contain the compiled .cmm files and the .hs code in the same library would “foreign import” the cmm function. In particular we could then just move the .cmm code we use for wrapping the GMP library calls from the RTS into the integer-gmp package. Then instead of getting primops like plusInteger# from the GHC.Prim module, we would just foreign import them, eg:

foreign import prim "plusInteger" plusInteger#
  :: Int# -> ByteArray#
  -> Int# -> ByteArray#
  -> (# Int#, ByteArray# #)

So that’s what I started implementing today, “foreign import prim“. It needs a slight extension in the lexer, parser, type checker, desugarer, core->stg, and stg->cmm phases. That sounds like a lot but the changes in each bit are pretty small. As a feature it is very similar to foreign C calls and also to primops, so fortunately it can share most code with those existing features. So far it’s going ok, I’ve got it producing convincing looking core, stg and cmm code. Tomorrow I’ll test it and review the design and changes with Simon Marlow.

If this works out ok then it should mean we’re still using the same well-tested gmp binding code and without any extra marshaling overhead. Correctness testing is mostly covered by the existing GHC testsuite. We still want to check the performance of course. To that end, Daniel has been working on an Integer performance benchmark. He’s tried it already using the simple pure-Haskell implementation of Integer. Apparently it does respectably but takes ages to calculate 10000 factorial.

Buildings plugins as Haskell shared libs

Thursday, May 21st, 2009

This post is a sneak preview about building Haskell shared libraries on Linux. We’ll look at how to use ghc to make a standalone Haskell shared library that exports C functions. We could use this shared library as part of a bigger project (without having to use ghc for the final linking) or we could load it dynamically, e.g. as a plugin in some other program.

This work is being supported by the IHG and it builds on the hard work of several other people over the last few years (see the first post in this series for the history and credits)

Building GHC with shared libs support

For starters you need the latest development version of GHC. See these instructions on getting the sources and doing the configure, build and install steps.

The only non-standard thing you need to do is to use ./configure --enable-shared. Note that this has only been tested on Linux x86-64 and x86, though in the past, the shared lib support has also worked on Linux PPC and OSX PPC.

Currently what you get is a ghc that itself is statically linked but it can build programs and shared libraries that dynamically link against the runtime system and base libraries.

Building programs that use shared libs

For example, for “hello world”:

$ ghc --make -dynamic Hello.hs

It is interesting to look at the output of the ldd program:

$ ldd ./Hello

I’ll not paste the whole output, but here’s a bit of it:

libHSbase-4.0.0.0-ghc6.11.so =>
  /opt/ghc/lib/ghc-6.11/base-4.0.0.0/libHSbase-4.0.0.0-ghc6.11.so
  (0x00007f8959aff000)

(I’ve simplified the ghc version slightly)

If you were to look at the full output what you would notice is that it links against each Haskell package as a separate .so file. What is more, it is able to find the shared libs even though they are not in a standard location like /usr/local/lib. This is because by default it is using the -rpath mechanism. It is also possible to build binaries in a mode that does not embed an rpath which might be more suitable for deployment.

Building shared libs

Suppose we have a module Foo.hs that uses the FFI to export a C function called foo():

module Foo where
import Foreign.C
foreign export ccall foo :: CInt -> CInt
foo :: CInt -> CInt
foo = ...

we can build it into a shared library:

$ ghc --make -dynamic -shared -fPIC Foo.hs -o libfoo.so

We need to use -dynamic, -shared and -fPIC. The -dynamic flag tells ghc at the compile step to produce code so that it can link dynamically to dependent packages. At the link step it tells ghc to actually link dynamically to dependent packages. The -shared flag tells ghc to link a shared library rather than a program. The -fPIC flag tells ghc to make code that is suitable to include into a shared library. If we were to break it down into separate compile and link steps then we would use:

$ ghc -dynamic -fPIC -c Foo.hs
$ ghc -dynamic -shared Foo.o Foo_stub.o -o libfoo.so

In principle you can use -shared without -dynamic in the link step. That would mean to statically link the rts all the base libraries into your new shared library. This would make a very big, but standalone shared library. However that would require all the static libraries to have been built with -fPIC so that the code is suitable to include into a shared library and we don’t do that at the moment.

If we use ldd again to look at the libfoo.so that we’ve made we will notice that it is missing a dependency on the rts library. This is problem that we’ve yet to sort out, so for the moment we can just add the dependency ourselves:

$ ghc --make -dynamic -shared -fPIC Foo.hs -o libfoo.so \
  -lHSrts-ghc6.11 -optl-Wl,-rpath,/opt/ghc/lib/ghc-6.11/

The reason it’s not linked in yet is because we need to be able to switch which version of the rts we’re using without having to relink every library. For example we want to be able to switch between the debug, threaded and normal rts versions. It’s quite possible to do this and it just needs a bit more rearranging in the build system to sort it out. Once it’s done you’ll even be able to switch rts at runtime, eg:

$ LD_PRELOAD=/opt/ghc/lib/ghc-6.11/libHSrts_debug-ghc6.11.so
$ ./Hello

Going back to our libfoo.so, now that it is linked against the rts it is completely standalone, we can link it into a C program using just gcc, or we can use dlopen() to load libfoo.so at runtime.

Assuming we’ve got libfoo.so in the current directory, we can link it into a C program:

$ gcc main.c -o main -lfoo -L.

If you use ldd now it’ll tell you that libfoo.so is not found. Remember that the runtime linker doesn’t look in the same places as the static linker. We told the static linker to look in the current directory with the flag -L.. For the dynamic linker we can either move our libfoo.so to /usr/local/lib or we can embed a path into the binary that tells the runtime linker where to look. One particularly neat way to do this is to tell it to look for the library not at an absolute path, but relative to the program itself:

$ gcc main.c -o main -lfoo -L. -Wl,-rpath,'$ORIGIN'

The Linux runtime linker understands the special variable $ORIGIN and interprets it as the location of the executable. This also works on Solaris. Windows and OS X have something similar. This makes it possible to distribute binaries along with shared libraries and have the whole lot fully relocatable.

If we want to load the library and call functions at runtime we would use C code like:

void *dl = dlopen("./libfoo.so", RTLD_LAZY);
int (*foo)(int a) = dlsym(dl, "foo");
printf("%d\n", foo(2500));

In this case we do not need to link our C program against libfoo.so (we just need -ldl for the dynamic linking functions like dlopen).

$ gcc main.c -o main -ldl

Now one thing to watch out for is that before you call any exported Haskell function, you have to start up the runtime system. If you just call foo() directly then it’ll emit a helpful error message to remind you. We have to use the C API of the Haskell FFI to initialise the runtime system. This is a little tiresome. In our case it’ll look like:

hs_init(&argc, &argv);
hs_add_root(__stginit_Foo);

The first line is specified by the Haskell FFI. The second is a GHC’ism. It initialises the module containing the function we’re going to call.

If you’re exporting a plugin API then hopefully the API will support some kind of plugin initialisation. In that case you can include the above C code to initialise the rts before any of the Haskell functions get called. We can do that by adding the above initialisation code into a C function and export that from our shared lib:


void init (void);
void init (void) { ... }

Then we would add init into our shared lib:

$ ghc -fPIC -c init.c
$ ghc -dynamic -shared Foo.o Foo_stub.o init.o -o libfoo.so \
  -lHSrts-ghc6.11 -optl-Wl,-rpath,/opt/ghc/lib/ghc-6.11/

Of course the calling program has to call init() first.

If you have to support a C API where there is no initialiser then we can use this trick:


static void init (void) __attribute__ ((constructor));
void init (void) { ... }

The constructor attribute means the function will be called on program startup or as soon as the library is loaded via dlopen.

Well-Typed at CUFP

Sunday, May 3rd, 2009

The Commercial Users of Functional Programming (CUFP) workshop is in Edinburgh this year, on the 4th September, along with the developer tracks on the 3rd and 5th. Both Duncan and I will be there, as well as at ICFP and the other co-located events. If you’ll be there then and would like to talk to us, either about Well-Typed or about the Industrial Haskell Group (IHG), then drop us an e-mail or just find us during the week.

In the mean time, if you’d like to give a 25 minute talk about your experiences with functional programming at CUFP, then you have just two weeks to submit a proposal. These talks are a great way for everyone to benefit from each others’ experiences. The call says:

Talks are typically 25 minutes long, but can be shorter. They aim to inform participants about how functional programming played out in real-world applications, focusing especially on the re-usable lessons learned, or insights gained. Your talk does not need to be highly technical; for this audience, reflections on the commercial, management, or software engineering aspects are, if anything, more important. You do not need to submit a paper! Talks on the practical application of functional programming with a primarily technical focus may also be appropriate for the adjacent DEFUN 2009 event.

If you are interested in offering a talk, or nominating someone to do so, send an e-mail to francesco(at)erlang-consulting(dot)com or jim(dot)d(dot)grundy(at)intel(dot)com by 15 May 2009 with a short description of what you’d like to talk about or what you think your nominee should give a talk about. Such descriptions should be about one page long.

“Hello world” now only 11k using GHC with shared libs

Tuesday, April 28th, 2009

$ ./Hello.dyn
Hello World!

$ ls -ogh Hello Hello.dyn
411K 2009-04-28 21:59 Hello
 11K 2009-04-28 21:55 Hello.dyn

On Linux x86-64 with GHC using shared libraries a “Hello World” program is now only 11k compared to 411k previously. By comparison, JHC manages 6.4k and an equivalent C program is 6.3k. (All sizes after running strip on the binary.)

As I mentioned earlier, the IHG has asked us to work on improving GHC’s support for shared libraries. I’ve been updating the new GHC build system to support --enable-shared and I’ve just now managed to get the build to go through. I’ll clean up my patches and send them in tomorrow. There are still a number of things to do. I’ve got to run the testsuite with everything built for shared libs. Clemens had this working before so I’m not expecting too many test failures. We also need to set up a GHC buildbot to use --enable-shared so that we do not get regressions.

The next task will be to test that it works to make a Haskell library that exports a C API and to use it as a plugin for some other program. Anyone got any good suggestions for a simple demo plugin? What programs have nice simple plugin APIs?

First round of Industrial Haskell Group development work

Tuesday, April 28th, 2009

The Industrial Haskell Group (IHG) have asked us to get cracking on a number of tasks:

  • Make dynamic/shared libraries work better
  • Make it possible to build GHC without using GMP
  • FFI checker/lint tool
  • Improving hsc2hs/c2hs + Cabal to make it easier to write C wrappers to C functions

We’ll talk in more detail about each one as we tackle them.

Shared libraries

We’ve started on the shared libraries task. This is quite a big area. Lots of people have put a lot of hard work into it already but there’s a fair bit left to do before we have GHC releases using them by default.

A little history

Wolfgang Thaller did a lot of the original work on generating position independent code (PIC) in the native codegen. Clemens Fruhwirth pushed things further along as part of a SoC project. He got shared libs working on Linux and started to address some of the packaging and management issues. GHC version 6.10 actually released with the shared libs code as an experimental feature.

Why do we care about shared libs?

There are several reasons we care. The greatest advantage is that it enables us to make plugins for other programs. There are loads of examples of this, think of plugins for things like vim, gimp, postgres, apache. On Windows if you want to make a COM or .NET component then it usually has to be as a shared library (a .dll file).

There has been most demand for this feature from Windows users over the years and for some time it has been possible to generate .dlls using GHC (though it was broken in version 6.10.1). It’s not been an easy feature to use however, and what’s more the current results are not exactly great. While you can currently take a bunch of Haskell modules that export a C API and make a .dll, the .dll file you get is huge. It statically links in the runtime system and all the other Haskell packages. So if you want to use more than one dll plugin then each one has it’s own copy of the GHC runtime system and all the libraries! Obviously this is not ideal. Having all these copies of the runtime system and base libs takes more memory, more disk space and slows things down. What everyone really wants is to be able to build the runtime system and each Haskell package as a separate .dll file. Then each plugin should be small and would share the runtime system and other dependencies that they have in common.

A somewhat superficial reason is that it makes your “Hello World” program much smaller because it doesn’t have to include a complete copy of the runtime system and half of the base library. It’s true that in most circumstances disk space is cheap, but if you’ve got some corporate shared storage that’s replicated and meticulously backed-up and if each of your 100 “small” Haskell plugins is actually 10MB big, then the disk space does not look quite so cheap.

Using shared libraries also makes things a bit easier for Haskell applications that want to do dynamic code loading. For example GHCi itself currently has to load two copies of the base package, the one that is statically linked with and another copy that it loads dynamically. With shared libraries it would just end up with another reference to the same copy of the single shared base library.

Shared libs also completely eliminates the need for the “split objs” hack that GHC uses to reduce the size of statically linked programs. This should make our link times a bit quicker.

What we’ll be doing

We’re planning to get things to the stage where a GHC user can make a working plugin on Linux x86, Linux x86-64 and Windows.

As recently as a few days ago people have managed to get GHC HEAD working with shared libraries on Linux x86-64. Since then however we’ve had the new GHC build system land in the HEAD branch. So the first thing I’ve been working on is porting the shared library support to the new build system. So far so good. I’ll report when I’ve got the build to go all the way through.