From frederic-emmanuel.picca at synchrotron-soleil.fr Tue Dec 10 09:47:41 2019 From: frederic-emmanuel.picca at synchrotron-soleil.fr (PICCA Frederic-Emmanuel) Date: Tue, 10 Dec 2019 09:47:41 +0000 Subject: [Haskell-beginners] how to parse Message-ID: Hello, I have a bunch of files names like this _data_00006.cbf I would like to parse it into this type data CbfDataFile = CbfDataFile Text Int where Test is and Int = 6 I am using attoparsec, I started to write my parser like this parseCbfDataFile :: String -> Parser CbfDataFile parseCbfDataFile s = do prefix <- ???? string "_data_" v <- decimal string ".cbf" return (CbfDataFile prefix v) So my question, is how to I write this parser. I miss for now the prefix part, and it seems also to me thaht I need something else for the numeric part. thanks for your help Frederic From fa-ml at ariis.it Tue Dec 10 10:28:03 2019 From: fa-ml at ariis.it (Francesco Ariis) Date: Tue, 10 Dec 2019 11:28:03 +0100 Subject: [Haskell-beginners] how to parse In-Reply-To: References: Message-ID: <20191210102803.GA9433@x60s.casa> On Tue, Dec 10, 2019 at 09:47:41AM +0000, PICCA Frederic-Emmanuel wrote: > I have a bunch of files names like this > _data_00006.cbf > > [...] > > I am using attoparsec, I started to write my parser like this > > > parseCbfDataFile :: String -> Parser CbfDataFile > parseCbfDataFile s = do > prefix <- ???? > string "_data_" > v <- decimal > string ".cbf" > return (CbfDataFile prefix v) > > > So my question, is how to I write this parser. `manyTill` [1] should do [1] https://hackage.haskell.org/package/attoparsec-0.13.2.2/docs/Data-Attoparsec-Combinator.html#v:manyTill From spiridempt at gmail.com Tue Dec 10 14:00:39 2019 From: spiridempt at gmail.com (F L) Date: Tue, 10 Dec 2019 22:00:39 +0800 Subject: [Haskell-beginners] Why many .tix files are generated in Haskell Program Coverage Message-ID: I am using Haskell Program Coverage (HPC) to collect coverage information for Pandoc (https://github.com/jgm/pandoc). However, after adding the -fhpc option to GHC and running the program, two .tix files are generated. One is test-pandoc.tix in the main directory and the other is pandoc.tix under the "test" directory (test/pandoc.tix). These two files have different contents and I just cannot see the meaning of them from their names. Moreover, I tried to execute "hpc show xxx.tix" for them and test/pandoc.tix gives "hpc: hash in tix file for module Main does not match hash in ./.hpc/Main.mix". So, I wonder the specific generation rules of HPC tix files (possibly including the number and location) and how I can get the correct coverage information. Thanks very much! -------------- next part -------------- An HTML attachment was scrubbed... URL: From frederic-emmanuel.picca at synchrotron-soleil.fr Tue Dec 10 20:16:48 2019 From: frederic-emmanuel.picca at synchrotron-soleil.fr (PICCA Frederic-Emmanuel) Date: Tue, 10 Dec 2019 20:16:48 +0000 Subject: [Haskell-beginners] how to parse In-Reply-To: <20191210102803.GA9433@x60s.casa> References: , <20191210102803.GA9433@x60s.casa> Message-ID: > `manyTill` [1] should do > [1] https://hackage.haskell.org/package/attoparsec-0.13.2.2/docs/Data-Attoparsec-Combinator.html#v:manyTill It works, thanks From toad3k at gmail.com Thu Dec 12 01:38:28 2019 From: toad3k at gmail.com (David McBride) Date: Wed, 11 Dec 2019 20:38:28 -0500 Subject: [Haskell-beginners] how to parse In-Reply-To: References: <20191210102803.GA9433@x60s.casa> Message-ID: I would actually use Data.Attoparsec.ByteString.Char8.takeWhile1. prefix <- takeWhile1 (/= '_') On Tue, Dec 10, 2019, 15:17 PICCA Frederic-Emmanuel < frederic-emmanuel.picca at synchrotron-soleil.fr> wrote: > > `manyTill` [1] should do > > [1] > https://hackage.haskell.org/package/attoparsec-0.13.2.2/docs/Data-Attoparsec-Combinator.html#v:manyTill > > It works, > > thanks > _______________________________________________ > Beginners mailing list > Beginners at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners > -------------- next part -------------- An HTML attachment was scrubbed... URL: From frederic-emmanuel.picca at synchrotron-soleil.fr Thu Dec 12 05:31:56 2019 From: frederic-emmanuel.picca at synchrotron-soleil.fr (PICCA Frederic-Emmanuel) Date: Thu, 12 Dec 2019 05:31:56 +0000 Subject: [Haskell-beginners] how to parse In-Reply-To: References: <20191210102803.GA9433@x60s.casa> , Message-ID: The problem is that prefix can contain also a '_' From Leonhard.Applis at protonmail.com Tue Dec 17 10:31:29 2019 From: Leonhard.Applis at protonmail.com (Leonhard Applis) Date: Tue, 17 Dec 2019 10:31:29 +0000 Subject: [Haskell-beginners] Question on Lazy Evaluation, Maps & Funktions Message-ID: Hi there,  I currently have a task at hand which requires to find a n-tuple of sentences s, where the biggest distance of sentences is searched(So the most dis-similar sentences shall be chosen).  Especially with growing n, this problem is causing me trouble, as in a naive implementation the number I need to calculate the distances grows exponentially by n and linear by s. The distance function itself is drop-in distance :: Text -> Text -> Double  For the questions the distance can be expected to behave in constant time, and the list of n-tuples to be deepseq ready.  Given my real world problems, this is giving me a hard time waiting.  Coming from a more imperative environment, one approach I'd take would be to calculate a dictionary of predefined distances, so instead of calculating the distance new I'd only have to lookup the distance. This would - in my understanding - mean that I create the dictionary which takes s * (s-1) steps to calculate the full dictionary, and then for s^n only dictionary lookups.  But from the little I remember "Real World Haskell", the functions and variables are only evaluated once and hang around until the garbage collector comes around.  Question 1.)is this right, and does this mean, that if there is no GC in between, the function distance(a,b) is only calculated once, and therefore the dictionary approach is mostly useless? Question 2.)  The function distance is symmetric, is there a way to help my Haskell (and the evaluation) to understand that it only needs to be run once?  Iff not, does it make sense to iterate the over the tuples first and "order" them, to help distances only occur in a certain order? Is there a way to create unique, ordered n-tuples in the first place? Additionally, I am grateful for any kind of help regarding performance-tricks to this case.  I know that this is a typical case for multi threading, but I first want to try and find "the best" base-algorithm. Also I think it's a nice question :)  Thank you very much Leonhard -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: publickey - Leonhard.Applis at protonmail.com - 0x807FDDF3.asc Type: application/pgp-keys Size: 1843 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 477 bytes Desc: OpenPGP digital signature URL: