[Haskell-beginners] How would you improve this program?
Daniel Fischer
daniel.is.fischer at googlemail.com
Mon Oct 10 14:11:29 CEST 2011
On Monday 10 October 2011, 10:13:45, Lorenzo Bolla wrote:
> A part from style, is there any reason why one should prefer "where"
> over "let..in"?
Undoubtedly, some people have come up with theoretical reasons why one is
generally better than the other. I haven't seen a convincing one yet,
though.
There are some cases where one is clearly better than the other, bindings
scoping over guards,
foo x y z
| a == 0 = ...
| b <= a = ...
| a <= b+c = ...
| otherwise = ...
where
a = notTooExpensive x y
b = soSoExpensive z y
c = reallyExpensive x y z
are much easier to write and read with where. On the other hand,
let binds in expr is an expression, so it's easily usable in contexts
requiring an expression such as lambda abstractions and do-blocks,
map (\x -> let a = h x in if b then foo a x else bar a x) list
do a <- b
c <- d
let e = f a c
bar e c
baz
foo a e
where a where would be awkward at best (though the map example isn't
particularly good, there
map quux list
where
quux x = if b then foo a x else bar a x
where
a = h x
isn't bad and becomes easier to understand if the lambda becomes
complicated enough).
So there are places where one is (clearly) more convenient/better than the
other, but in other places it's purely up to personal preference.
> > You build the map via fromListWith (+). That's bad if you have input
> > with large counts. fromListWith (+) gives you thunks of the form
> > (...(1+1)+1)...+1) as map-entries. That costs a lot of space and time.
> > Better build the map using insertWith',
> >
> > mp = foldl' ins empty words
> > where
> > ins m w = insertWith' (+) w 1 m
>
> Please find the "strict" version here:
> https://github.com/lbolla/stanford-cs240h/blob/master/lab1/lab1.hs.
One further time-saver (in printTokens):
maxCount = maximum $ map count tokens
you need not traverse the entire list of tokens for that,
maxCount = case sortedTokens of
(Token _ c:_) -> c
>
> Plot of times is now:
> https://github.com/lbolla/stanford-cs240h/blob/master/lab1/times.png
>
> Python is still slightly faster, but I'm getting there.
>
> > 3. a Python version using defaultdict.
> >
> > sort is O(n*log n), map and python are O(n*log d), where n is the
> > number of words and d the number of distinct words.
>
> I believe Python's defaultdict have (optimistically) O(1) lookup and
> insert, because it is implemented as a hashmap.
Ah, I tend to forget about hashmaps. Although the O(1) isn't quite true
[calculating the hash isn't bounded time, and the lookup in the bucket need
not be either], for a good hashtable it's true enough that it scales better
than a tree-based map.
More information about the Beginners
mailing list