optimization question
Hal Daume III
hdaume at ISI.EDU
Mon Feb 23 08:48:55 EST 2004
I have some numbers on this.
I have a list of first names for males from the census data. I have a
function 'male :: String -> Maybe Double' which returns (maybe) the
probability of a person being given that name. I have two versions, one
based on string matching, the other based on building the data into a trie
and then converting the trie into haskell source and the using that to
match). For example, we have:
> module Male where
>
> male "james" = Just 3.318
> male "john" = Just 3.271
> male "robert" = Just 3.143
> male "michael" = Just 2.629
> male "william" = Just 2.451
> male "david" = Just 2.363
> male "richard" = Just 1.703
> ...
> male _ = Nothing
and in the other version, we have
> male = male_start
> where
> male_start ('a':xs) = male_66 xs
> male_start ('b':xs) = male_67 xs
> male_start ('c':xs) = male_68 xs
> male_start ('d':xs) = male_69 xs
> ...
> male_start _ = Nothing
> male_66 ('a':xs) = male_66_66 xs
> male_66 ('b':xs) = male_66_67 xs
> male_66 ('d':xs) = male_66_69 xs
> male_66 ('g':xs) = male_66_72 xs
> ...
Finally, I implemented a version which reads data into a finitemap.
the original database contains 1219 names.
i test this by taking all male names, all female names, randomizing them,
and repeating this data 20 times. this leads to 109880 runs.
i ran each implementation 5 times; the results are:
using trie:
0.890u 0.020s 0:00.90 101.1% 0+0k 0+0io 327pf+0w
0.910u 0.000s 0:00.90 101.1% 0+0k 0+0io 327pf+0w
0.870u 0.030s 0:00.90 100.0% 0+0k 0+0io 327pf+0w
0.910u 0.020s 0:00.93 100.0% 0+0k 0+0io 327pf+0w
0.920u 0.020s 0:00.95 98.9% 0+0k 0+0io 327pf+0w
using string-matching:
10.280u 0.060s 0:10.51 98.3% 0+0k 0+0io 280pf+0w
10.340u 0.030s 0:10.86 95.4% 0+0k 0+0io 279pf+0w
10.310u 0.040s 0:10.72 96.5% 0+0k 0+0io 281pf+0w
10.330u 0.040s 0:10.55 98.2% 0+0k 0+0io 280pf+0w
10.420u 0.020s 0:10.63 98.2% 0+0k 0+0io 280pf+0w
for finitemap:
1.110u 0.020s 0:01.14 99.1% 0+0k 0+0io 195pf+0w
1.110u 0.010s 0:01.14 98.2% 0+0k 0+0io 195pf+0w
1.100u 0.030s 0:01.14 99.1% 0+0k 0+0io 195pf+0w
1.120u 0.010s 0:01.15 98.2% 0+0k 0+0io 195pf+0w
1.190u 0.010s 0:01.24 96.7% 0+0k 0+0io 195pf+0w
so string-matching is terribly slow; using the finitemap is actually
surprisingly fast, though still about 30% slower than the trie version.
perhaps a better-optimized trie version would do better, but it's hard to
say.
On Mon, 23 Feb 2004, Simon Peyton-Jones wrote:
> The trouble is that you probably *don't* want to expand this
> case x of { "foogle" -> e1; _ -> e2 }
> to this
>
> case x of
> c1:x1 -> case c1 of
> 'f' -> case x1 of
> c2:x2 -> case c2 of
> 'o' -> of ....
>
> So GHC generates a series of equality tests instead. A decent
> alternative might be:
>
> generate case expressions when there is more
> than one string in the list, otherwise use an equality test
>
> That would not be hard to do. If it becomes important to you, I'd have
> a go. But before doing so, could you do the work by hand and see if it
> makes a useful performance difference?
>
> Simon
>
> | -----Original Message-----
> | From: glasgow-haskell-users-bounces at haskell.org
> [mailto:glasgow-haskell-users-
> | bounces at haskell.org] On Behalf Of Sven Panne
> | Sent: 22 February 2004 15:32
> | To: John Meacham
> | Cc: glasgow-haskell-users at haskell.org
> | Subject: Re: optimization question
> |
> | John Meacham wrote:
> | > I was wondering if:
> | >
> | > case x of
> | > "foo" -> Foo
> | > "bar" -> Bar
> | > "fuzz" -> Fuzz
> | > "fuzo" -> Fuzo
> | > x -> other .. thing
> | >
> | > would optimize to
> | >
> | > let z = other .. thing in
> | > case x of
> | > ('f':x) -> case x of
> | > ('u':'z': x) ->
> | > "z" -> Fuzz
> | > "o" -> Fuzo
> | > _ -> z
> | > "oo" -> Foo
> | > _ -> z
> | > "bar" -> Bar
> | > _ -> z
> |
> | String literals are handled in a special way in GHC, so your example
> is
> | essentially converted into an if-cascade, which is not what you want.
> | OTOH, if you write the strings in their expanded form like
> ['f','o','o'],
> | you get your optimized version automatically. Perhaps Simon^2 can
> comment
> | on the rationale behind this, I can't remember the reason...
> |
> | Cheers,
> | S.
> |
> | _______________________________________________
> | Glasgow-haskell-users mailing list
> | Glasgow-haskell-users at haskell.org
> | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>
--
Hal Daume III | hdaume at isi.edu
"Arrest this man, he talks in maths." | www.isi.edu/~hdaume
More information about the Glasgow-haskell-users
mailing list