[Haskell-cafe] Optimizing nearest-k code

Fri May 25 13:59:30 UTC 2018

Also: are you sure

let rs = subsets xs in rs  ++ map (x:) rs

is more efficient than

subsets xs ++ map (x:) (subsets xs)

I would have assumed these would be the same due to laziness, etc.

On 05/24/2018 12:36 PM, Mark Wotton wrote:
> there are a few things you can do.
>
> I ran it locally and got 3s cpu time and 3gb of allocation, more or less.
>
> first off, your subset implementation computes the subsets twice.
> taking that off took it down to 2s.
>
> After that it's mostly in the Set implementation. Using Map a ()
> instead gives you access to Data.Map.Merge.Strict.merge, which is a
> bit more efficient (going over the two data structures once only)
>
> ```
> module Main where
>
> import Data.List
> import Data.Map.Strict (Map)
> import qualified Data.Map.Merge.Strict as Map
> import qualified Data.Map.Strict as Map
>
> nearest_k :: (Ord a) => Int -> [(Map a (), v)] -> Map a () -> [(Map a (), v)]
> nearest_k k bs b = take k bs' where
>     bs' = sortOn (hamming b) bs
>
> hamming :: (Ord a)=> Map a () -> (Map a (), v) -> Int
> hamming x (y, _) = hamming_distance x y
>
> hamming_distance :: (Ord a)=> Map a () -> Map a () -> Int
> hamming_distance xs ys = Map.size (Map.merge Map.preserveMissing
> Map.preserveMissing (Map.zipWithMaybeMatched (\_ _ _ -> Nothing)) xs
> ys)
>
> subsets :: [a] -> [[a]]
> subsets []  = [[]]
> subsets (x:xs) = let rs = subsets xs in rs  ++ map (x:) rs
>
> int_lists :: [[Int]]
> int_lists = subsets [1..20]
>
> values :: [(Map Int (), Int)]
> values = map f (zip [1..] int_lists) where
>     f (i, x) = (Map.fromList(zip  x (repeat ())), i)
>
> test = nearest_k 8 values (Map.fromList (zip [1,2,3] (repeat ())))
>
>
> main = print test
> ```
>
> that took it to 1s, and now profiling indicates more than half the
> time is spent in generation of the test values, so I'll leave it
> there.
>
> I think if you wanted to do better than this you'd have to do some
> algorithmic changes - for instance, once your worst candidate is n
> steps away, you can stop calculating the hamming distance for anything
> else once it's > n, as it can't contribute usefully to the nearest
> neighbours.
>
> cheers,
> Mark
>
> On Thu, May 24, 2018 at 12:15 PM, Richard Evans
> <richardprideauxevans at gmail.com> wrote:
>> Dear Haskell Cafe,
>>
>> Given a set of sets, and a particular target set, I want to find the sets
>> that are nearest (in terms of Hamming distance) to the target set.
>>
>> I am using the following code:
>>
>> import Data.List
>> import qualified Data.Set as Set
>>
>> nearest_k :: Ord a => Int -> [(Set.Set a, v)] -> Set.Set a -> [(Set.Set a,
>> v)]
>> nearest_k k bs b = take k bs' where
>>     bs' = sortOn (hamming b) bs
>>
>> hamming :: Ord a => Set.Set a -> (Set.Set a, v) -> Int
>> hamming x (y, _) = hamming_distance x y
>>
>> hamming_distance :: Ord a => Set.Set a -> Set.Set a -> Int
>> hamming_distance xs ys = Set.size (Set.difference xs ys) + Set.size
>> (Set.difference ys xs)
>>
>>
>>
>> subsets :: [a] -> [[a]]
>> subsets []  = [[]]
>> subsets (x:xs) = subsets xs ++ map (x:) (subsets xs)
>>
>> int_lists :: [[Int]]
>> int_lists = subsets [1..20]
>>
>> values :: [(Set.Set Int, Int)]
>> values = map f (zip [1..] int_lists) where
>>     f (i, x) = (Set.fromList x, i)
>>
>> test = nearest_k 8 values (Set.fromList [1,2,3])
>>
>> ----
>>
>> This works ok for the test above (with sets of ints), but is rather slow in
>> my actual application (in which the sets are large sets of ground atoms of
>> first-order logic). Is there some major optimization I should be doing here?
>>
>> thanks,
>> Richard
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> To (un)subscribe, modify options or view archives go to:
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>> Only members subscribed via the mailman list are allowed to post.
>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20180525/1396018a/attachment.sig>