[Haskell-cafe] Re: Suspected stupid Haskell Question
bertram.felgenhauer at googlemail.com
Thu Oct 18 03:23:52 EDT 2007
Thomas Hartman wrote:
> Since I'm interested in the stack overflow issue, and getting acquainted
> with quickcheck, I thought I would take this opportunity to compare your
> ordTable with some code Yitzchak Gale posted earlier, against Ham's
> original problem.
> As far as I can tell, they're the same. They work on lists up to 100000
> element lists of strings, but on 10^6 size lists I lose patience waiting
> for them to finish.
> Is there a more scientific way of figuring out if one version is better
> than the other by using, say profiling tools?
> Or by reasoning about the code?
No, measuring actual performance is the only way.
> import Data.List
> import qualified Data.Map as M
> import Control.Arrow
> import Test.QuickCheck
> import Test.GenTestData
I couldn't find this one, so I was unable to test anything.
- Your code doesn't contain a main function. Did you compile it?
- Strings are lists; storing a string of n characters needs 12*n
bytes on 32 bit architectures, and 24*n bytes with 64 bits.
1 million strings with 10 characters each will consume 120MB (or 240MB),
without accounting for any overhead for the copying garbage collector.
I expect that you can save some memory (and thus garbage collection
time) by using Data.ByteString.ByteString (which uses about 32 + n
(32 bits) or 64 + n bytes (64 bits), if I remember correctly).
- On using Data.Map vs. sort and group: The main advantage of the first
approach is that duplicates are eliminated as they are found. So in
fact, if you have a list of only 100 different strings, your code
will run in constant memory, assuming the list is built lazily.
Your test case looks like there are only few duplicates, so I'd expect
the Data.Map code to perform a bit worse than the Data.List one. But
as I wrote above, profiling is the only way to find out for sure.
More information about the Haskell-Cafe