[Haskell-beginners] Re: Iterating through a list of char...

Thu Apr 29 19:27:54 EDT 2010

Am Donnerstag 29 April 2010 23:13:21 schrieb Jean-Nicolas Jolivet:
> > -- assuming that only characters > 'i' (chr 105) are escaped (or the
> > escape character itself, but that should be dropped regardless).
>
> Sorry, this is my fault for not being clearer about it: I am decoding
> text that is already encoded with a Binary to ASCII encoding... the
> exact encoding process is this:
>
>  1. Fetch a character from the input stream.
>  2. Increment the character's ASCII value by 42, modulo 256
>  3. If the result is a critical character ('\NUL', '\r', '\n' or '='),
> write the escape character ('=') to the output stream and increment
> character's ASCII value by 64, modulo 256. 4. Output the character to
> the output stream.

Okay, so there are no two escape characters in succession.

>
> I am writing a decoder here so obviously I am reversing the process....
> (I remove 42 for regular characters, 106 for special characters...and if
> the result is < 0, I add 256 to it...)
>
> Just adding (yet again) more context information here ;) Still trying to
> fully understand your last suggestions Daniel! :) Thanks again!
>
> Also, I wouldn't want anyone to think I just want someone to write the
> algorithm for me! :)  Like I said, I already have a fully working
> algorithm, but being new to Haskell, I'm looking for ways to optimize it
> (or, really just different ways that I could do it!)
>
> Here's how I am doing it right now (I am using ByteString here,

Then the foldr is not the best option.

> but it should be pretty straightforward anyway)...
> I'm warning you, this is pretty ugly :)
>
> -- Decode an encoded ByteString
> -- the zip + tail method removes the first character so I am adding it
> afterward (ugly)
> decodeByteString :: L.ByteString -> L.ByteString
> decodeByteString str = do
>   let str1 = mapMaybe decodepair (L.zip str(L.tail str))
>   let firstChar = decodechar (ord(L.head str) - 42)
>   L.pack (firstChar:str1)

Is it really necessary to pack it? That's a relatively expensive operation, 
it may be better to have it return a String if you're not doing anything 
with it after decoding except writing it to a file or so.

The zipping is also not the optimal choice, it takes a lot of checking for 
the chunk-boundaries (I assume you're using lazy ByteStrings, since you 
chose the prefix L), and you construct pairs only to deconstruct them 
immediately (the compiler *may* optimise them away, but I'm skeptical).

>
> -- Decode a pair of character, returning either the
> -- decoded character or Nothing
> decodepair :: (Char, Char) -> Maybe Char
> decodepair cs
>
>   | snd(cs) == '='  = Nothing
>   | fst(cs) == '='  = Just (decodechar(ord(snd cs) - 106))
>   | otherwise       = Just (decodechar(ord(snd cs) - 42))
>
> -- Reverse the modulo 256...
> decodechar :: Int -> Char
> decodechar i
>
>   | i < 0 = chr (i + 256)
>   | otherwise = chr i

Since you're doing arithmetic modulo 256, that stuff can be done faster and 
simpler with Word8.

----------------------------------------------------------------------
{-# LANGUAGE BangPatterns #-}

import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString as S
import Data.ByteString.Unsafe (unsafeAt)

escape :: Word8 -> Word8
escape = (+150)

normal :: Word8 -> Word8
normal = (+214)

decodeW :: L.ByteString -> [Word8]
decodeW = dec False . L.toChunks
    where
      dec _ [] = []
      dec esc (str:more) = go esc 0
        where
          !len = S.length str
          {-# INLINE charAt #-}
          charAt :: Int -> Word8
          charAt i = unsafeAt str i
          go !b !i
            | i == len  = dec b more
            | b         = escape (charAt i) : go False (i+1)
            | otherwise = case charAt i of
                            61 -> go True (i+1)
                            c  -> normal c : go False (i+1)

word8ToChar :: Word8 -> Char
word8ToChar = toEnum . fromIntegral

decodeC :: L.ByteString -> String
decodeC = map word8ToChar . decodeW

decodeBS :: L.ByteString -> L.ByteString
decodeBS = L.pack . decodeW
----------------------------------------------------------------------
>
> Jean-Nicolas Jolivet
>
> On 2010-04-29, at 4:50 PM, Daniel Fischer wrote:
> > Am Donnerstag 29 April 2010 21:37:15 schrieb Jean-Nicolas Jolivet:
> >> First I would like to thank everyone for the very interesting replies
> >> and suggestions I got so far!...
> >>
> >> I tried to implement (and at the very least understand) most of
> >> them!...
> >>
> >> To add to the context here, what I am trying to do is:
> >>
> >> -apply a "transformation" to a character (in my case, subtracting 42
> >> to its ASCII value, which I obtain with chr(ord(c) - 42) -if the
> >> character is preceded by a specific character (that would be, an
> >> escape character, in this case '=') then subtract 106 to its value
> >> instead of 42... -if the character is the escape character itself,
> >> '=',  then skip it altogether
> >
> > Ah, that complicates matters a little.
> > - What happens if ord c < 42 (ord c < 106, if c is preceded by the
> > escape character?)
> > - What about escaped escape characters?
> >
> > However,
> >
> > foo xs = catMaybes $ zipWith f (' ':xs) xs
> >    where
> >        f _ '=' = Nothing
> >        f '=' c = Just (chr $ ord c - 106)
> >        f _ c   = Just (chr $ ord c - 42)
> >
> > is still pretty simple, as is the direct recursion
> >
> > foo = go ' '
> >    where
> >        go _ ('=' :cs) = go '=' cs
> >        go '=' (c:cs)  = chr (ord c - 106) : go c cs
> >        go _ (c:cs)    = chr (ord c - 42) : go c cs
> >        go _ _         = []
> >
> > -- assuming that only characters > 'i' (chr 105) are escaped (or the
> > escape character itself, but that should be dropped regardless).
> >
> > fooGen :: Char -> (Char -> Char) -> (Char -> Char) -> String -> String
> > fooGen e esc norm str = catMaybes $ zipWith f (d:str) str
> >    where
> >        d = if e == maxBound then pred e else succ e
> >        f x y
> >
> >          | y == e    = Nothing
> >          | x == e    = Just (esc y)
> >          | otherwise = Just (norm y)
> >
> > is an easy generalisation.
> >
> >> (keeping in mind that the next character needs to be
> >> escaped)...
> >>
> >> I managed to do it, however I'm not totally satisfied in the way I
> >> did it... the problem was that... as I just explained, in some cases,
> >> the character that is being processed has to be "skipped" (and by
> >> that I mean, not added to the resulting string). This happens when
> >> the processed character IS the escape character...
> >>
> >> What I did was to build a List of Maybe Char.... my function does the
> >> proper operation on the character and returns a "Just Char" when the
> >> character is processed, or Nothing when it is the escaped
> >> character... so basically I would end up with something like:  [Just
> >> 'f', Just 'o', Just 'o', Nothing]... I am mapping this using mapMaybe
> >> to end up with a proper String...
> >>
> >> Would there be any more efficient way of doing this?
> >
> > That is already pretty efficient. The direct recursion is probably a
> > bit more efficient, but I don't think the difference will be large.
> >
> >> Considering that
> >> the escape character should NOT be added to the resulting string, is
> >> there any way I can avoid using the Maybe monad?
> >
> > Sure, apart from the direct recursion,
> >
> > fooGen e esc norm str = tail $ foldr f [] (d:str)
> >    where
> >        d = if e == maxBound then pred e else succ e
> >        f x (y:zs)
> >
> >          | y == e    = x:zs
> >          | x == e    = x:esc y:zs
> >          | otherwise = x:norm y:zs
> >
> >          f x [] = [x]
> >
> > catMaybes and zipWith is clearer, though, and I don't think the foldr
> > will perform better.
> >
> >> Once again, thanks everyone for all the suggestions!
> >>
> >> Jean-Nicolas Jolivet