[Haskell-cafe] Re: Suspected stupid Haskell Question

Thu Oct 18 03:23:52 EDT 2007

Thomas Hartman wrote:
> Since I'm interested in the stack overflow issue, and getting acquainted 
> with quickcheck, I thought I would take this opportunity to compare your 
> ordTable with some code Yitzchak Gale posted earlier, against Ham's 
> original problem.
> 
> As far as I can tell, they're the same. They work on lists up to 100000 
> element lists of strings, but on 10^6 size lists I lose patience waiting 
> for them to finish. 
> 
> Is there a more scientific way of figuring out if one version is better 
> than the other by using, say profiling tools?
> 
> Or by reasoning about the code?

No, measuring actual performance is the only way.

> t.
> 
> ****************************************
> 
> import Data.List
> import qualified Data.Map as M
> import Control.Arrow
> import Test.QuickCheck
> import Test.GenTestData

I couldn't find this one, so I was unable to test anything.
[snip]

Some ideas:

- Your code doesn't contain a  main  function. Did you compile it?
- Strings are lists; storing a string of n characters needs 12*n
  bytes on 32 bit architectures, and 24*n bytes with 64 bits.

  1 million strings with 10 characters each will consume 120MB (or 240MB),
  without accounting for any overhead for the copying garbage collector.

  I expect that you can save some memory (and thus garbage collection
  time) by using Data.ByteString.ByteString (which uses about 32 + n
  (32 bits) or 64 + n bytes (64 bits), if I remember correctly).
- On using Data.Map vs. sort and group: The main advantage of the first
  approach is that duplicates are eliminated as they are found. So in
  fact, if you have a list of only 100 different strings, your code
  will run in constant memory, assuming the list is built lazily.

  Your test case looks like there are only few duplicates, so I'd expect
  the Data.Map code to perform a bit worse than the Data.List one. But
  as I wrote above, profiling is the only way to find out for sure.

Bertram