[Haskell-cafe] Searching of several substrings (with Data.Text ?)

Tillmann Vogt Tillmann.Vogt at rwth-aachen.de
Tue Jul 5 21:59:04 CEST 2011


Am 05.07.2011 21:29, schrieb Eric Rasmussen:
> I've been looking into building parsers at runtime (from a config
> file), and in my case it's beneficial to fit them into the context of
> a larger parser with Attoparsec.Text. This code is untested for
> practical use so I doubt you'll see comparable performance to the
> aforementioned regex packages, but it could be worth exploring if you
> need to mix and match parsers or if the definitions can change
> arbitrarily at runtime.
>
> import qualified Data.Text as T
> import Data.Attoparsec.Text
> import Control.Applicative ((<|>))
>
> parseLigature x = string (T.pack x)
>
> charToText = do c<- anyChar
>                  return (T.singleton c)
>
> buildChain [x]    = parseLigature x
> buildChain (x:xs) = try (parseLigature x)<|>  buildChain xs
>
> -- ordering matters here, so "ffi" comes before "ff" or "fi"
> ligatures = buildChain ["ffi", "th", "ff", "fi", "fl"]
>
> myParser = many (try ligatures<|>  charToText)
>
> -- at ghci prompt: parseOnly myParser (T.pack "the fluffiest bunny")
> -- Right ["th","e"," ","fl","u","ffi","e","s","t"," ","b","u","n","n","y"]

Of course parsec! I should have thought of this.
icu seems to be the best solution (I already considered it for parsing 
character references), but it is not so easy to install on windows. So I 
wait until cabal does this or it is integrated into the haskell platform.

Thank you all for your help (especially the attoparsec example)

>
>
>
> On Tue, Jul 5, 2011 at 12:09 PM, Bryan O'Sullivan<bos at serpentine.com>  wrote:
>> On Tue, Jul 5, 2011 at 11:01 AM, Tillmann Vogt
>> <Tillmann.Vogt at rwth-aachen.de>  wrote:
>>> I looked at Data.Text
>>> http://hackage.haskell.org/packages/archive/text/0.5/doc/html/Data-Text.html
>>> and
>>> http://hackage.haskell.org/packages/archive/stringsearch/0.3.3/doc/html/Data-ByteString-Search.html
>>>
>>> but they don't have a function that can search several substrings in one
>>> run.
>> Here's what you want:
>> http://hackage.haskell.org/packages/archive/text-icu/0.6.3.4/doc/html/Data-Text-ICU-Regex.html
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>
>>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>




More information about the Haskell-Cafe mailing list