[Haskell-cafe] Optimizing nearest-k code

Thu May 24 16:15:45 UTC 2018

Dear Haskell Cafe,

Given a set of sets, and a particular target set, I want to find the sets
that are nearest (in terms of Hamming distance) to the target set.

I am using the following code:

import Data.List
import qualified Data.Set as Set

nearest_k :: Ord a => Int -> [(Set.Set a, v)] -> Set.Set a -> [(Set.Set a,
v)]
nearest_k k bs b = take k bs' where
    bs' = sortOn (hamming b) bs

hamming :: Ord a => Set.Set a -> (Set.Set a, v) -> Int
hamming x (y, _) = hamming_distance x y

hamming_distance :: Ord a => Set.Set a -> Set.Set a -> Int
hamming_distance xs ys = Set.size (Set.difference xs ys) + Set.size
(Set.difference ys xs)

subsets :: [a] -> [[a]]
subsets []  = [[]]
subsets (x:xs) = subsets xs ++ map (x:) (subsets xs)

int_lists :: [[Int]]
int_lists = subsets [1..20]

values :: [(Set.Set Int, Int)]
values = map f (zip [1..] int_lists) where
    f (i, x) = (Set.fromList x, i)

test = nearest_k 8 values (Set.fromList [1,2,3])

----

This works ok for the test above (with sets of ints), but is rather slow in
my actual application (in which the sets are large sets of ground atoms of
first-order logic). Is there some major optimization I should be doing
here?

thanks,
Richard
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20180524/a77b9483/attachment.html>