[Haskell-cafe] striping non-alphanumericals

Bob Ippolito bob at redivi.com
Wed Dec 11 22:01:41 UTC 2013


On Wed, Dec 11, 2013 at 1:37 PM, Joerg Fritsch <fritsch at joerg.cc> wrote:

> I have the following code snippet:
>
> import System.IO
>
> import Data.String.Utils
>
> main = withFile "test.txt" ReadMode $ \handle -> do
>
>            xs <- getwords handle
>
>            sequence_ $ map putStrLn (escapeRe xs)
>
> getwords :: Handle -> IO [String]
>
> getwords h = hGetContents h >>= return . words
>
>
>
> What I want to to there is to get i.e. “word,” or “word!” etc. and arrive
> at “word”. I understand that escapeRe may do this. However, I always get
> some sort of mismatch errors like this:
>
>
>
> test.hs:6:38:
>
>     Couldn't match type `Char' with `[Char]'
>
>     Expected type: [String]
>
>       Actual type: String
>
>     In the return type of a call of `escapeRe'
>
>     In the second argument of `map', namely `(escapeRe xs)'
>
>     In the second argument of `($)', namely
>
>       `map putStrLn (escapeRe xs)'
>
> test.hs:6:47:
>
>     Couldn't match type `[Char]' with `Char'
>
>     Expected type: String
>
>       Actual type: [String]
>
>     In the first argument of `escapeRe', namely `xs'
>
>     In the second argument of `map', namely `(escapeRe xs)'
>
>     In the second argument of `($)', namely
>
>       `map putStrLn (escapeRe xs)'
>
> Now I have three questions:
>
> 1.      Is escapeRe the right function to use here?
>
`escapeRe` is not the correct function to use. That is the function you
would use if you were trying to create a regular expression to match the
given input, but this is not at all what you are doing.


> 2.      What do I do wrong?
>

Well, the type is wrong because you did `sequence_ $ map putStrLn (escapeRe
xs)` instead of `sequence_ $ map (putStrLn . escapeRe) xs`. Note that
`sequence_ $ map f xs` can be written as `mapM_ f xs` which is much shorter
and more clear. This is what I would write:

  mapM_ (putStrLn . escapeRe) xs

That said, `escapeRe` is not at all useful for what you are trying to do.
You should probably use `filter` and `isAlphaNum` from Data.Char.

3.      I read in the Real World Haskell book that actually all these
> file/string operations are very very slow. The recommendation is to work
> with bytestrings instead. Is there any (fast) way to strip
> non-alphanumericals from bytestrings?
>
This is true. You should use Text or ByteString for performance. Text is
probably more appropriate for your use case. You can efficiently solve this
exercise with functionality from Data.Char, Data.Text, and Data.Text.IO.

Note that this sort of question might be more appropriate for
haskell-beginners: http://www.haskell.org/mailman/listinfo/beginners

-bob
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20131211/e02d85db/attachment.html>


More information about the Haskell-Cafe mailing list