[Haskell-cafe] Optimizing nearest-k code

Richard Evans richardprideauxevans at gmail.com
Thu May 24 18:03:07 UTC 2018


Thanks Mark, this is very helpful.



On Thu, May 24, 2018 at 6:36 PM, Mark Wotton <mwotton at gmail.com> wrote:

> there are a few things you can do.
>
> I ran it locally and got 3s cpu time and 3gb of allocation, more or less.
>
> first off, your subset implementation computes the subsets twice.
> taking that off took it down to 2s.
>
> After that it's mostly in the Set implementation. Using Map a ()
> instead gives you access to Data.Map.Merge.Strict.merge, which is a
> bit more efficient (going over the two data structures once only)
>
> ```
> module Main where
>
> import Data.List
> import Data.Map.Strict (Map)
> import qualified Data.Map.Merge.Strict as Map
> import qualified Data.Map.Strict as Map
>
> nearest_k :: (Ord a) => Int -> [(Map a (), v)] -> Map a () -> [(Map a (),
> v)]
> nearest_k k bs b = take k bs' where
>     bs' = sortOn (hamming b) bs
>
> hamming :: (Ord a)=> Map a () -> (Map a (), v) -> Int
> hamming x (y, _) = hamming_distance x y
>
> hamming_distance :: (Ord a)=> Map a () -> Map a () -> Int
> hamming_distance xs ys = Map.size (Map.merge Map.preserveMissing
> Map.preserveMissing (Map.zipWithMaybeMatched (\_ _ _ -> Nothing)) xs
> ys)
>
> subsets :: [a] -> [[a]]
> subsets []  = [[]]
> subsets (x:xs) = let rs = subsets xs in rs  ++ map (x:) rs
>
> int_lists :: [[Int]]
> int_lists = subsets [1..20]
>
> values :: [(Map Int (), Int)]
> values = map f (zip [1..] int_lists) where
>     f (i, x) = (Map.fromList(zip  x (repeat ())), i)
>
> test = nearest_k 8 values (Map.fromList (zip [1,2,3] (repeat ())))
>
>
> main = print test
> ```
>
> that took it to 1s, and now profiling indicates more than half the
> time is spent in generation of the test values, so I'll leave it
> there.
>
> I think if you wanted to do better than this you'd have to do some
> algorithmic changes - for instance, once your worst candidate is n
> steps away, you can stop calculating the hamming distance for anything
> else once it's > n, as it can't contribute usefully to the nearest
> neighbours.
>
> cheers,
> Mark
>
> On Thu, May 24, 2018 at 12:15 PM, Richard Evans
> <richardprideauxevans at gmail.com> wrote:
> > Dear Haskell Cafe,
> >
> > Given a set of sets, and a particular target set, I want to find the sets
> > that are nearest (in terms of Hamming distance) to the target set.
> >
> > I am using the following code:
> >
> > import Data.List
> > import qualified Data.Set as Set
> >
> > nearest_k :: Ord a => Int -> [(Set.Set a, v)] -> Set.Set a -> [(Set.Set
> a,
> > v)]
> > nearest_k k bs b = take k bs' where
> >     bs' = sortOn (hamming b) bs
> >
> > hamming :: Ord a => Set.Set a -> (Set.Set a, v) -> Int
> > hamming x (y, _) = hamming_distance x y
> >
> > hamming_distance :: Ord a => Set.Set a -> Set.Set a -> Int
> > hamming_distance xs ys = Set.size (Set.difference xs ys) + Set.size
> > (Set.difference ys xs)
> >
> >
> >
> > subsets :: [a] -> [[a]]
> > subsets []  = [[]]
> > subsets (x:xs) = subsets xs ++ map (x:) (subsets xs)
> >
> > int_lists :: [[Int]]
> > int_lists = subsets [1..20]
> >
> > values :: [(Set.Set Int, Int)]
> > values = map f (zip [1..] int_lists) where
> >     f (i, x) = (Set.fromList x, i)
> >
> > test = nearest_k 8 values (Set.fromList [1,2,3])
> >
> > ----
> >
> > This works ok for the test above (with sets of ints), but is rather slow
> in
> > my actual application (in which the sets are large sets of ground atoms
> of
> > first-order logic). Is there some major optimization I should be doing
> here?
> >
> > thanks,
> > Richard
> >
> > _______________________________________________
> > Haskell-Cafe mailing list
> > To (un)subscribe, modify options or view archives go to:
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> > Only members subscribed via the mailman list are allowed to post.
>
>
>
> --
> A UNIX signature isn't a return address, it's the ASCII equivalent of a
> black velvet clown painting. It's a rectangle of carets surrounding a
> quote from a literary giant of weeniedom like Heinlein or Dr. Who.
>         -- Chris Maeda
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20180524/885b3db1/attachment.html>


More information about the Haskell-Cafe mailing list