[Haskell-cafe] the last mile in haskell performance

Alberto G. Corona agocorona at gmail.com
Sat Nov 14 11:30:12 UTC 2015


Thanks to your insight I took a look at Data.Vector.Unboxed

I was very happy, but there is a caveat; Looking at the documentation:

"In particular, unboxed vectors of pairs are represented as pairs of
unboxed vectors". https://tldrify.com/cfj

it seems that derivingUnbox and all the Unboxed Vectors code do as much as
is possible with what GHC offers to make things unboxed, which are reduced
to the primitive types.

So the only way to pack an user defined data is to create one Vector of
unboxed primitive type for each  field of the data . So if I have a data
with five fields the result is five Vectors.

This is nice in some cases, but does most of the time does not. this does
not solve the problem of CPU cache since the fields in the data are at
least lenght (Vector)  away. I mean that if the vector is moderately long,
if the first field is in the cache, the second or third etc may not be.
Usually the fields of any data are handled together.

This solution will not match C- C++ speed consistently for most single
threaded programs.  Moreover there is an additional cost in the
packing/unpacking necessary.

However this is the best that can be done with what GHC offers now.

The solution is the possibility of unboxed user-defined types:

data MyUnboxedData= MyUnboxedData#    Int#    Float# ....

where MyUnboxedData#   is an unboxed constructor

Then there are some questions:
1) Am I wrong?
2) It is feashible?
3) It is worth the pain? (My answer: yes This is not satisfactory, it is
very important for Haskell success and should be addressed if there is no
 insurmountable barrier)
4) Are there alternatives before making a formal proposal?


2015-11-13 12:42 GMT+01:00 Alberto G. Corona <agocorona at gmail.com>:

> That is more practical.
> It is a pity that the Unboxed class does not interact with the UNPACK
> pragma. Or it seems so.
> The problem that round my head after being exposed to the Google
> phylosophy of "count the bytes" is the trade off when choosing containers:
> Either boxed, pure, linked multithreaded or  unboxed mutable single
> threaded.
> Haskell philosophy choose the first, while almost all the mainstream
> languages choose the second. That is another problem for the adoption of
> haskell.  Google people say: we don´t need multithreaded programs because
> we run many single threaded programs in one machine, we use all the cores.
> Only when there is a single application to run the justification for the
> extra effort of multithreading is justified. And this happens rarely in the
> real world: In scientific, engineering, financial it is usual. It also
> happens in distributed settings but in that last case, performance per
> thread and core is also critical, so Haskell is ruled out.
> But that hasn´t to be that way, or at least that is what I think.
> DiffUArrays are internally mutable but with a pure interface.  They use a
> kind of versioning. in single threaded environments it theoretically
> perform at mutable speeds.  the versioning of diffArray is the blend of
> packed and linked structure that can mix the two worlds.
> If the unboxing is extended to any kind of user defined data, the
> versioning idea can be used to have containers that perform at C speeds
> when single threaded, but preserve the purity when multithreaded.  So it is
> possible to have the cake and eat it too.
> it is even possible to codify balanced binary trees in a compact
> diffarray, so very fast maps can be used in single threaded applications
> that also are pure and work multithreaded.
> The goal is to remove the objections about haskell coming from that side
> of the computer industry by having such containers available without
> forcing the user to know lots of things about the internals of haskell and
> GHC.
> I do not know if there are thing going on in some of this direction. Maybe
> I´m being simplistic and there is something that I miss .
> By experience I know that what sell more from a language is not the real
> performance numbers, but the approaches that the language takes and how
> much that promises for the future:
> For example I can develop a kind of container following this idea that
> perform badly both in single and multithreaded. for sure the early version
> should be so. But, if  people understand that the design has potential for
> being optimized in the future, people will buy the idea and will accept
> happily the Haskell language for performance semi-critical apps because
> they will have arguments against the objections of this kind .
> 2015-11-12 23:56 GMT+01:00 Michael Snoyman <michael at snoyman.com>:
>> How about vector-th-unbox[1]?
>> [1] https://www.stackage.org/package/vector-th-unbox
>> On Thu, Nov 12, 2015 at 2:53 PM, Alberto G. Corona <agocorona at gmail.com>
>> wrote:
>>> There are no examples. It is hard to guess the functionality and the
>>> maturity of he approach.
>>> 2015-11-12 18:56 GMT+01:00 David Kraeutmann <kane at kane.cx>:
>>>> This might be of interest to you:
>>>> https://hackage.haskell.org/package/structs
>>>> On 11/12/2015 6:49 PM, Alberto G. Corona wrote:
>>>> > Looking at this:
>>>> >
>>>> >
>>>> https://downloads.haskell.org/~ghc/6.12.3/docs/html/users_guide/primitives.html
>>>> >
>>>> > It seems that it is impossible to manage data in Haskell within a core
>>>> > without L1 cache faults. Except for unboxed arrays of primitive types.
>>>> >
>>>> > Since it is impossible to have unboxed arrays of user-defined types.
>>>> >
>>>> > Am I right?
>>>> >
>>>> > This is definitively very bad for tasks that are inherently single
>>>> threaded
>>>> > and in general for the image of Haskell as a practical language.
>>>> >
>>>> > I have more to say about that, but I would like to know first if I´m
>>>> right
>>>> > and second If there is some idea to going on to permit user defined
>>>> boxed
>>>> > datatypes.  Or if there is some low level trick for having it using
>>>> foreign
>>>> > call and unsafeCoerce in some way,
>>>> >
>>>> > I know that the language ATS has unboxing a la carte....
>>>> >
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > Haskell-Cafe mailing list
>>>> > Haskell-Cafe at haskell.org
>>>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>>>> >
>>> --
>>> Alberto.
>>> _______________________________________________
>>> Haskell-Cafe mailing list
>>> Haskell-Cafe at haskell.org
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> --
> Alberto.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20151114/950055f8/attachment.html>

More information about the Haskell-Cafe mailing list