[Haskell-cafe] transparent parallelization
Jan-Willem Maessen
Janwillem.Maessen at sun.com
Tue Sep 18 17:52:11 EDT 2007
On Sep 18, 2007, at 4:24 PM, <bf3 at telenet.be> <bf3 at telenet.be> wrote:
>> http://haskell.org/haskellwiki/GHC/Data_Parallel_Haskell
>
> Wow this is cool stuff! It would be nice to have something like
> this for the Playstation 3 :-)
>
> Regarding parallelism, I wander how this extension will compare to
> Sun's Fortress language, if/when it gets finally released.
The scope of the two is very different. DPH proposes a single rather
flexible data structure---nested Data Parallel Arrays (really as much
list-like as array-like). The underlying data structure is
manipulated using bulk operations like zip, sum, and comprehensions.
By contrast, Fortress defines the notion of a "Generator" which you
can think of as being akin to a parallel version of Data.Traversable
or ListLike, where the fundamental primitive is a generalization of
foldP and mapP. This is strictly more general---we can define many
of the operations in Data.PArr on arbitrary data types, permitting us
to talk about the sum of the elements of a set, or mapping a function
across a distributed array. We can define nested data parallel
arrays in Fortress. There isn't (yet) an equivalent of the RULES
pragma that permits Fortress to optimize combinations of function
calls. However, clever uses of type information and function
overloading let Fortress do some interesting program transformations
of its own (eg early exit for reductions with left zeros). Finally,
Fortress actually has a mechanism for defining new types of
comprehensions (though this isn't in the language specification yet).
The other nice thing about Generators is that we can support
consumption of large or infinite things, if we're very careful about
how we do the consumption. We're planning to write the equivalent of
hGetContents that works over blocks of file data in parallel where
possible, but processes streams as chunks of data become available.
It remains to be seen how this will work out in practice, though.
Our goal is something LazyByteString or rope-like.
So: DPH: available today (-ish), one (very flexible) data structure.
Bulk operations on a data structure run in parallel. Relies on RULES
+ special compiler support (am I getting that right? You can fuse
multiple traversals of a common argument, which isn't doable using
RULES, right?) to make it all run nicely. A well-established
performance model, cribbed from NESL, for the PArr bits.
Fortress: Arrays and lists currently built in. Bulk operations on a
data structure can run in parallel. Ability to define new parallel
types with carefully-tailored traversal (eg we have a PureList that's
based on Ralf Hinze and Ross Paterson's finger tree where traversal
walks the tree structure in parallel). No optimization RULES yet (an
interpreter doesn't optimize), but a good deal of type-based code
selection. Implementation less complete than DPH in general (even
the Generator API is in flux, though the fundamental use of a foldP-
like operation hasn't changed over time).
-Jan-Willem Maessen
Longtime Haskell Hacker
Fortress Library Developer
PS - By the way, working on Generators has increased my suspicion
that comprehensions do NOT have a naturally monadic structure (which
bugged me when I worked on parallel list traversal optimization in pH
back in 1994). It just happens that for cons-lists the two
structures happen to coincide. If anyone else has had similarly
subversive thoughts I'd love to chat.
More information about the Haskell-Cafe
mailing list