[Haskell-cafe] poor performance when generating random text

Alfredo Di Napoli alfredo.dinapoli at gmail.com
Wed Oct 17 10:45:20 CEST 2012


What about this? I've tested on my pc and seems pretty fast. The trick is
to generate the gen only once. Not sure if the inlines helps, though:

import qualified Data.Text as T
import System.Random.MWC
import Control.Monad
import System.IO
import Data.ByteString as B
import Data.Word (Word8)
import Data.ByteString.Char8 as CB


{- | Converts a Char to a Word8. Took from MissingH -}
c2w8 :: Char -> Word8
c2w8 = fromIntegral . fromEnum


charRangeStart :: Word8
charRangeStart = c2w8 'a'
{-# INLINE charRangeStart #-}

charRangeEnd :: Word8
charRangeEnd = c2w8 'z'
{-# INLINE charRangeEnd #-}

--genString :: Gen RealWorld -> IO B.ByteString
genString g = do
    randomLen <- uniformR (50 :: Int, 450 :: Int) g
    str <- replicateM randomLen $ uniformR (charRangeStart, charRangeEnd) g
    return $ B.pack str


writeCorpus :: FilePath -> IO [()]
writeCorpus file = withFile file WriteMode $ \h -> do
  let size = 100000
  _ <- withSystemRandom $ \gen ->
      replicateM size $ do
        text <- genString gen :: IO B.ByteString
        CB.hPutStrLn h text
  return [()]

main :: IO [()]
main =  writeCorpus "test.txt"



A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20121017/fbf3c993/attachment.htm>


More information about the Haskell-Cafe mailing list