[Haskell-cafe] Re: Slower with ByteStrings?
apfelmus
apfelmus at quantentunnel.de
Tue May 29 10:26:31 EDT 2007
Mirko Rahn wrote:
>>> from the letters of that word. A letter can be used at most as many
>>> times as it appears in the input word. So, "letter" can only match
>>> words with 0, 1, or 2 t's in them.
>
>> frequencies = map (\x -> (head x, length x)) . group . sort
>> superset xs = \ys -> let y = frequencies ys in
>> length y == lx &&
>> and (zipWith (\(c,i) (d,j) -> c == d && i >= j) x y)
>> where
>> x = frequencies xs
>> lx = length x
>
> As far as I understand the spec, this algorithm is not correct:
>
> superset "ubuntu" "tun" == False
>
> Is at least one 'b' necessary, yes or no?
Oops, you are indeed right, the answer should be "no". I thought I'd
came away without primitive recursion, but here's a correct version
superset xs = superset' x . sort ys
where
x = sort xs
_ `superset` [] = True
[] `superset` _ = False
(x:xs) `superset'` (y:ys)
| x == y = xs `superset` ys
| x < y = xs `superset` (y:ys)
| otherwise = False
> If the answer is no, the
> following algorithm solves the problem and is faster then the one above:
>
> del y = del_acc []
> where del_acc _ [] = mzero
> del_acc v (x:xs) | x == y = return (v++xs)
> del_acc v (x:xs) = del_acc (x:v) xs
>
> super u = not . null . foldM (flip del) u
>
> main = interact $ unlines . filter ("ubuntu" `super`) . lines
The algorithm is correct but it's not faster, xs `super` ys takes
O(n*m) time whereas superset takes O(n * log n + m * log m) time given a
proper sorting algorithm. Here, n = length xs and m = length ys.
Actually, both algorithms are essentially the same except for the
sorting that allows to drop some equality tests.
(Note that memoizing x = sort xs over different ys speeds things up a
bit for the intended application. This way, (sort "ubuntu") is only
computed once and the running time over many ys approaches O(n + m*log m).)
Regards,
apfelmus
PS: Some exercises for the interested reader:
1) Still, the algorithm super has an advantage over superset. Which one?
2) Put xs into a good data structure and achieve a O(m * log n) time for
multiple ys.
3) Is this running time always better than the aforementioned O(n +
m*log m)? What about very large m > n?
More information about the Haskell-Cafe
mailing list