[Haskell-beginners] Re: Iterating through a list of char...
Daniel Fischer
daniel.is.fischer at web.de
Thu Apr 29 19:27:54 EDT 2010
Am Donnerstag 29 April 2010 23:13:21 schrieb Jean-Nicolas Jolivet:
> > -- assuming that only characters > 'i' (chr 105) are escaped (or the
> > escape character itself, but that should be dropped regardless).
>
> Sorry, this is my fault for not being clearer about it: I am decoding
> text that is already encoded with a Binary to ASCII encoding... the
> exact encoding process is this:
>
> 1. Fetch a character from the input stream.
> 2. Increment the character's ASCII value by 42, modulo 256
> 3. If the result is a critical character ('\NUL', '\r', '\n' or '='),
> write the escape character ('=') to the output stream and increment
> character's ASCII value by 64, modulo 256. 4. Output the character to
> the output stream.
Okay, so there are no two escape characters in succession.
>
> I am writing a decoder here so obviously I am reversing the process....
> (I remove 42 for regular characters, 106 for special characters...and if
> the result is < 0, I add 256 to it...)
>
> Just adding (yet again) more context information here ;) Still trying to
> fully understand your last suggestions Daniel! :) Thanks again!
>
> Also, I wouldn't want anyone to think I just want someone to write the
> algorithm for me! :) Like I said, I already have a fully working
> algorithm, but being new to Haskell, I'm looking for ways to optimize it
> (or, really just different ways that I could do it!)
>
> Here's how I am doing it right now (I am using ByteString here,
Then the foldr is not the best option.
> but it should be pretty straightforward anyway)...
> I'm warning you, this is pretty ugly :)
>
> -- Decode an encoded ByteString
> -- the zip + tail method removes the first character so I am adding it
> afterward (ugly)
> decodeByteString :: L.ByteString -> L.ByteString
> decodeByteString str = do
> let str1 = mapMaybe decodepair (L.zip str(L.tail str))
> let firstChar = decodechar (ord(L.head str) - 42)
> L.pack (firstChar:str1)
Is it really necessary to pack it? That's a relatively expensive operation,
it may be better to have it return a String if you're not doing anything
with it after decoding except writing it to a file or so.
The zipping is also not the optimal choice, it takes a lot of checking for
the chunk-boundaries (I assume you're using lazy ByteStrings, since you
chose the prefix L), and you construct pairs only to deconstruct them
immediately (the compiler *may* optimise them away, but I'm skeptical).
>
> -- Decode a pair of character, returning either the
> -- decoded character or Nothing
> decodepair :: (Char, Char) -> Maybe Char
> decodepair cs
>
> | snd(cs) == '=' = Nothing
> | fst(cs) == '=' = Just (decodechar(ord(snd cs) - 106))
> | otherwise = Just (decodechar(ord(snd cs) - 42))
>
> -- Reverse the modulo 256...
> decodechar :: Int -> Char
> decodechar i
>
> | i < 0 = chr (i + 256)
> | otherwise = chr i
Since you're doing arithmetic modulo 256, that stuff can be done faster and
simpler with Word8.
----------------------------------------------------------------------
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString as S
import Data.ByteString.Unsafe (unsafeAt)
escape :: Word8 -> Word8
escape = (+150)
normal :: Word8 -> Word8
normal = (+214)
decodeW :: L.ByteString -> [Word8]
decodeW = dec False . L.toChunks
where
dec _ [] = []
dec esc (str:more) = go esc 0
where
!len = S.length str
{-# INLINE charAt #-}
charAt :: Int -> Word8
charAt i = unsafeAt str i
go !b !i
| i == len = dec b more
| b = escape (charAt i) : go False (i+1)
| otherwise = case charAt i of
61 -> go True (i+1)
c -> normal c : go False (i+1)
word8ToChar :: Word8 -> Char
word8ToChar = toEnum . fromIntegral
decodeC :: L.ByteString -> String
decodeC = map word8ToChar . decodeW
decodeBS :: L.ByteString -> L.ByteString
decodeBS = L.pack . decodeW
----------------------------------------------------------------------
>
> Jean-Nicolas Jolivet
>
> On 2010-04-29, at 4:50 PM, Daniel Fischer wrote:
> > Am Donnerstag 29 April 2010 21:37:15 schrieb Jean-Nicolas Jolivet:
> >> First I would like to thank everyone for the very interesting replies
> >> and suggestions I got so far!...
> >>
> >> I tried to implement (and at the very least understand) most of
> >> them!...
> >>
> >> To add to the context here, what I am trying to do is:
> >>
> >> -apply a "transformation" to a character (in my case, subtracting 42
> >> to its ASCII value, which I obtain with chr(ord(c) - 42) -if the
> >> character is preceded by a specific character (that would be, an
> >> escape character, in this case '=') then subtract 106 to its value
> >> instead of 42... -if the character is the escape character itself,
> >> '=', then skip it altogether
> >
> > Ah, that complicates matters a little.
> > - What happens if ord c < 42 (ord c < 106, if c is preceded by the
> > escape character?)
> > - What about escaped escape characters?
> >
> > However,
> >
> > foo xs = catMaybes $ zipWith f (' ':xs) xs
> > where
> > f _ '=' = Nothing
> > f '=' c = Just (chr $ ord c - 106)
> > f _ c = Just (chr $ ord c - 42)
> >
> > is still pretty simple, as is the direct recursion
> >
> > foo = go ' '
> > where
> > go _ ('=' :cs) = go '=' cs
> > go '=' (c:cs) = chr (ord c - 106) : go c cs
> > go _ (c:cs) = chr (ord c - 42) : go c cs
> > go _ _ = []
> >
> > -- assuming that only characters > 'i' (chr 105) are escaped (or the
> > escape character itself, but that should be dropped regardless).
> >
> > fooGen :: Char -> (Char -> Char) -> (Char -> Char) -> String -> String
> > fooGen e esc norm str = catMaybes $ zipWith f (d:str) str
> > where
> > d = if e == maxBound then pred e else succ e
> > f x y
> >
> > | y == e = Nothing
> > | x == e = Just (esc y)
> > | otherwise = Just (norm y)
> >
> > is an easy generalisation.
> >
> >> (keeping in mind that the next character needs to be
> >> escaped)...
> >>
> >> I managed to do it, however I'm not totally satisfied in the way I
> >> did it... the problem was that... as I just explained, in some cases,
> >> the character that is being processed has to be "skipped" (and by
> >> that I mean, not added to the resulting string). This happens when
> >> the processed character IS the escape character...
> >>
> >> What I did was to build a List of Maybe Char.... my function does the
> >> proper operation on the character and returns a "Just Char" when the
> >> character is processed, or Nothing when it is the escaped
> >> character... so basically I would end up with something like: [Just
> >> 'f', Just 'o', Just 'o', Nothing]... I am mapping this using mapMaybe
> >> to end up with a proper String...
> >>
> >> Would there be any more efficient way of doing this?
> >
> > That is already pretty efficient. The direct recursion is probably a
> > bit more efficient, but I don't think the difference will be large.
> >
> >> Considering that
> >> the escape character should NOT be added to the resulting string, is
> >> there any way I can avoid using the Maybe monad?
> >
> > Sure, apart from the direct recursion,
> >
> > fooGen e esc norm str = tail $ foldr f [] (d:str)
> > where
> > d = if e == maxBound then pred e else succ e
> > f x (y:zs)
> >
> > | y == e = x:zs
> > | x == e = x:esc y:zs
> > | otherwise = x:norm y:zs
> >
> > f x [] = [x]
> >
> > catMaybes and zipWith is clearer, though, and I don't think the foldr
> > will perform better.
> >
> >> Once again, thanks everyone for all the suggestions!
> >>
> >> Jean-Nicolas Jolivet
More information about the Beginners
mailing list