[Haskell-cafe] Searching of several substrings (with Data.Text ?)

Eric Rasmussen ericrasmussen at gmail.com
Tue Jul 5 21:29:16 CEST 2011


I've been looking into building parsers at runtime (from a config
file), and in my case it's beneficial to fit them into the context of
a larger parser with Attoparsec.Text. This code is untested for
practical use so I doubt you'll see comparable performance to the
aforementioned regex packages, but it could be worth exploring if you
need to mix and match parsers or if the definitions can change
arbitrarily at runtime.

import qualified Data.Text as T
import Data.Attoparsec.Text
import Control.Applicative ((<|>))

parseLigature x = string (T.pack x)

charToText = do c <- anyChar
                return (T.singleton c)

buildChain [x]    = parseLigature x
buildChain (x:xs) = try (parseLigature x) <|> buildChain xs

-- ordering matters here, so "ffi" comes before "ff" or "fi"
ligatures = buildChain ["ffi", "th", "ff", "fi", "fl"]

myParser = many (try ligatures <|> charToText)

-- at ghci prompt: parseOnly myParser (T.pack "the fluffiest bunny")
-- Right ["th","e"," ","fl","u","ffi","e","s","t"," ","b","u","n","n","y"]




On Tue, Jul 5, 2011 at 12:09 PM, Bryan O'Sullivan <bos at serpentine.com> wrote:
> On Tue, Jul 5, 2011 at 11:01 AM, Tillmann Vogt
> <Tillmann.Vogt at rwth-aachen.de> wrote:
>>
>> I looked at Data.Text
>> http://hackage.haskell.org/packages/archive/text/0.5/doc/html/Data-Text.html
>> and
>> http://hackage.haskell.org/packages/archive/stringsearch/0.3.3/doc/html/Data-ByteString-Search.html
>>
>> but they don't have a function that can search several substrings in one
>> run.
>
> Here's what you want:
> http://hackage.haskell.org/packages/archive/text-icu/0.6.3.4/doc/html/Data-Text-ICU-Regex.html
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>



More information about the Haskell-Cafe mailing list