[Haskell-cafe] Library for Sparse Vectors?
Alexander Solla
alex.solla at gmail.com
Tue Jul 26 23:46:46 CEST 2011
On Tue, Jul 26, 2011 at 1:30 PM, dokondr <dokondr at gmail.com> wrote:
> Hi,
> Can't find on hackage any sparse vector library. Does such thing exist?
> I need efficient storage and dot product calculation for very sparse
> vectors with about 10 out of 40 000 non-zero components.
> One solution would be to represent Sparse Vector as Data.Map with
> (component_index, component_value) pairs to store non-zero components of the
> vector.
>
I would make a different suggestion:
Store a Set (or maybe an IntMap) of (IntMap scalar)s.
In other words, let each vector be represented by an IntMap whose key
represents the n'th component of the vector, and whose value is the proper
scalar.
> In this case, for example, calculating cosine similarity (
> http://en.wikipedia.org/wiki/Cosine_similarity) for for every pair of 10
> 000 vectors, would not be very nice and efficient, I am afraid.
>
Given two (IntMap Double)s a and b, I would compute the projection of a
along b as
cosineSimilarity :: IntMap Double -> IntMap Double -> Double
cosineSimilarity a b = (dot a b) / ((norm a) * (norm b)) where
dot = sum . elems . intersectionWith (*)
norm = (**0.5) . sum . fmap (**2) . elems
The only part I find tricky is enumerating all 10000^2 pairs
pairs :: Int -> [Int]
pairs dim = do
x <- [1..dim]
y <- [1..dim]
return (x,y)
and computing the projection for each pair:
projections :: Floating scalar => IntMap (IntMap scalar) -> Map (Int, Int)
scalar
projections space = let dimensions = undefined -- find max key in elements
of space?
m_projection space x y = cosineSimilarity <$> lookup
x space
<*> lookup
y space
in fromList . filter (isMaybe . snd)
. fmap (\(x,y) -> ((x,y), m_projection
space x y)
. pairs
$ dimensions
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110726/b1b5d625/attachment.htm>
More information about the Haskell-Cafe
mailing list