[Haskell-cafe] Re: Function to detect duplicates

Thu Feb 25 20:30:35 EST 2010

Am Freitag 26 Februar 2010 00:57:48 schrieb Rafael Gustavo da Cunha Pereira 
Pinto:
> Just to clarify the issue, I will propose the puzzle:
>
> There is a single 10 digit number that:
>
> 1) uses all ten digits [0..9], with no repetitions
> 2) the number formed by the first digit (right to left, most
> significant) is divisible by one
> 3) the number formed by the first 2 digits (again right to left) is
> divisible by two
> 4) the number formed by the first 3 digits is divisible by three
>  and so on, until:
> 11) the number formed by the first 10 digits (all!) is by 10
>
> Actually this can be solved by a little logic, but I wanted to give a
> try on brute force search using haskell.

Okay, so I won't talk about choosing a better algorithm :)

>
> I am not looking very large lists, but I was expecting a handful of
> small lists.

And these are so short that actually

noneRepeated xs = xs == nub xs

is *faster* than sorting and grouping.

>
> My algorithm follow these steps:
>
> 1) start with an list of empty list ([[]]), call it ds
> 2) I cons each member of [0..9] to ds
> 3) filter using:
>       a) noneRepeated
>       b) (listToNum d) `mod` l == 0, where l is the length of each

Reverse the tests, \l d -> (listToNum d) `mod` l == 0 is cheap in 
comparison to noneRepeated, even with noneRepeated xs = xs == nub xs.

> sublist d (not computed, it is an accumulator that is incremented each
> time I cons) 4) repeat steps 2-3 until l==10
>
>
> So, I represent each possible number as a reversed list of its digits...
> It ran REALLY fast (sub-second).
>
> So, bragging about Haskell with a Smalltalk-lover friend, by showing him
> how clean was the code and how easy was to profile, I figured out that I
> spent 99% on noneRepeated.

That doesn't run long enough to get a reliable profile, even if you reduce 
the tick-time to 1ms.

>
> After changing to the merge sort version, I have 30% on noneRepeated,
> 30% on listToNum and 30% on putStrLn. Pretty good!
>
> Besides, I could brag a little more about Hakell to that specific
> friend!! ;-)
>
>
> Best regards to you all!!
>
> Rafael
>
>
> PS: Here is the original search code, with the bad noneRepeated and
> still using length
>
>
>
> import Data.List
>
> digits=[0..9]
>
> noneRepeated::[Integer]->Bool
> noneRepeated=null.(filter (>1)).(map length).group.sort
>
> listToNum::[Integer]->Integer
> listToNum = (foldl (\a x->10*a+x) 0).reverse

Doesn't really matter, but try to acquire the habit of using foldl' rather 
than foldl (unless you need foldl for its additional laziness). You'll run 
into fewer laziness leaks that way.

>
> check::[Integer]->Bool
> check ds= and [noneRepeated ds, (listToNum ds) `mod` l==0]
>     where l=fromIntegral $ length ds

Use (&&) if you have only two tests.

>
> nextlevel::[[Integer]]->[[Integer]]
> nextlevel dss=filter (check) [d:ds | ds<-dss,d<-digits]

Why not move the checks into the generation,

nextlevel dss = filter ((== 0) . (`mod` l) . listToNum)
                           [d:ds | ds <- dss, d <- digits, d `notElem` ds]
      where
        l = 1 + length (head dss)

or

nextlevel dss =
    let l = 1 + length (head dss)
    in [d:ds | ds <- dss, let n = 10*listToNum ds
             , d <- digits, d `notElem` ds, (n+d) `mod` l == 0]

? At least the d `notElem` ds seems very natural here (and it's more 
efficient, too).

>
> main=do
>     dss<-runlevel 10 0 [[]]
>     print $ map (listToNum) dss
>
> runlevel 0 b dds=return dds
> runlevel a b dds=do
>     let dds'=nextlevel dds
>     putStrLn $ "Level "++(show (b+1))++": "++(show $ length dds')++"
> matches"
>     print $ map (listToNum) dds'
>     runlevel (a-1) (b+1) dds'