[Haskell-beginners] Bytestring question

Wed Jan 26 14:28:25 CET 2011

On Wednesday 26 January 2011 01:52:33, Peter Braun wrote:
> Hi everyone,
>
> as an exercise for learning Haskell i'm writing a program that converts
> Ascii Stl files (a simple format for 3D model data) into binary Stl
> format. In my first attempt i used normal strings and the result was
> therefore very slow. Now i rewrote the program to use lazy bytestrings
> instead.
>
> But well... it got even slower, so i'm probably doing something terribly
> wrong ;)
>
> Here's what i do (the relevant parts):
>
> ...
> asciiFile <- L.readFile (args!!0)
> binHandle <- openBinaryFile (args!!1) WriteMode
> let asciiLines = L.split (c2w '\n') asciiFile
> ...
> parseFile binHandle (Normal, tail asciiLines) -- First line contains a
> comment
> ...
>
> where L is Data.ByteString.Lazy. readFile ought to be lazy so it should
> not read the whole file into ram at this point. But when i split the
> lines and pass them to a function, is this still carried out lazily?

Yes, readFile reads a chunk and only proceeds to read the next when it is 
required. I'm not sure how lazy split is exactly, it could stop at the 
first newline or it could split the entire chunk in one go, but that 
wouldn't make much difference either way.

>
> parseFile processes a line, depending on the StlLineType and then calls
> itself recursive like this:
>
> parseFile :: Handle -> (StlLineType, [L.ByteString]) -> IO ()

Shouldn't the type better be

parseFile :: Handle -> StlLineType -> [L.ByteString] -> IO ()

?

> ...
> parseFile h (Vertex1, s) = do
>      let vals = extractVertex (head s)
>      L.hPutStr h $ runPut (writeFloatArray vals)
>      parseFile h (Vertex2, tail s)

pattern match, please

parseFile h (Vertex1, []) = return () -- or what you have to do at the end
parseFile h (Vertex1, (l:ls)) = do
    let vals = extractVertex l
    L.hPutStr $ runPut (writeFloatArray vals)
    parseFile h (Vertex2, ls)

> ...
>
> extractVertex looks like this:
>
> extractVertex :: L.ByteString -> [Float]
> extractVertex s = let fracs = filter (\n -> L.length n > 0) $ L.split
> (c2w ' ') s
>                                      in    [read (C.unpack(fracs!!1)) ::
> Float,

Ouch, if you're unpacking everything, what's the point of using 
ByteStrings? And splitting ByteStrings is sort of expensive too.

Okay, trouble is, there's no obvious way to parse a Float from a 
ByteString, but bytestring-lexing provides parsing Doubles, you could use 
that and convert the Doubles to Floats with GHC.Float.double2Float (or, if 
you have optimisations turned on, with realToFrac, which should then be 
rewritten to double2Float). That should be much faster than unpacking and 
using read (particularly since the Read instances of Float and Double are 
slow).

>                                              read (C.unpack(fracs!!2))
>
> :: Float,
>
>                                              read (C.unpack(fracs!!3))
>
> :: Float]

Instead of list-indexing with (!!), pattern matching gives nicer code.

>
> where C is Data.ByteString.Lazy.Char8. It splits a byte string, filters
> out the whitespaces and converts certain entries to floats. Maybe unpack
> is an expensive operation. Is there a better way to convert a Bytestring
> to float?

You could also try using attoparsec and write a real parser for your file, 
that should be pretty snappy. attoparsec also provides

double :: Parser Double

(no direct parsing of Float provided), you could then again call 
double2Float on the result.

>
> I know, this is bad Haskell code ;) But where is my grand, obvious
> misuse of Bytestring?

Lots of splitting into small pieces and lots of unpacking. Both add up to 
considerable cost. I suspect also read to take a substantial amount of the 
time, but you also have that for String IO.

>
> I'm grateful for any suggestion to improve that code. I'm using ghc,
> version 6.12.1.
>
> Thank you,
> Peter