[Haskell-cafe] broken IO support in uvector package, when using non primitive types

Sat Mar 14 14:28:12 EDT 2009

I'm sorry, I didn't mean to imply otherwise.

I can see your point, but maybe it would be even more flexible in that
kind of situation to keep a separate UIO-like API that allows one to
explicitly request a particular size? For your large dataset, you
could specify the entire filesize (divided by the size of your
elements, maybe) while Manlio could take care of storing his array
sizes in a different form if necessary. The semantics of UIO loading
could then become "I have a large chunk of data on a handle that I
need to load verbatim; tell me how much to load", which would work on
pipes, sockets, and other non-file sources, as well as still being
useful in your case of having enormous amounts of data.

Anyway, I'm clearly in no position to be deciding on significant API
changes for uvector, but having more than one option in a
high-performance library like this seems like a good thing.

Cheers,
Dan

On Sat, Mar 14, 2009 at 1:20 PM, Malcolm Wallace
<malcolm.wallace at cs.york.ac.uk> wrote:
>> The main issue seems to be that although the semantics of UIO may be
>> arbitrary, Wallace's patch actually broke deserialization for any
>> production-based UArr, and I'm not sure the benefits are worthwhile
>> (loading a file someone else sent you) given that endianness is
>> already not taken into account when loading (so the chances of someone
>> giving you a raw binary file that happens to contain values of the
>> correct endianness is rather low, it seems).
>
> In my experience, having written several libraries in Haskell for
> serialisation and deserialisation, it is highly problematic when a library
> writer decides that all data to be stored began its life in Haskell, and is
> only being serialised in order to be read back in again by the same Haskell
> library.  I have already made that mistake myself in two different libraries
> now, eventually regretting it (and fixing it).
>
> The real utility of serialisation is when it is possible to read data from
> any arbitrary external source, and to write data according to external
> standards.  A library that can only read and write data in its own
> idiosyncratic format is not production-ready at all.
>
> This is why I submitted the patch that enables the uvector library to read
> raw binary data that was not produced by itself.  I had 300Gb of data from
> an external source that I needed to deal with efficiently, and uvector was
> the ideal candidate apart from this small design flaw.  And yes, my code
> also had to deal with endianness conversion on this data.
>
> Regards,
>    Malcolm
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>