[Haskell-cafe] Library for Sparse Vectors?
dokondr
dokondr at gmail.com
Wed Jul 27 00:27:23 CEST 2011
Thanks for the detailed reply and example!
Using IntMap as a vector seems to be a good idea.
In your example:
1) I would use:
dot = dot' * dot'
dot' = sum . elems . intersectionWith (*)
norm = sum . fmap (**2) . elems
instead of:
dot = sum . elems . intersectionWith (*)
norm = (**0.5) . sum . fmap (**2) . elems
2) I don't understand the syntax:
cosineSimilarity <$> lookup x space
<*> lookup y space
What are <$> and <*>?
Thanks,
Dmitri
On Wed, Jul 27, 2011 at 1:46 AM, Alexander Solla <alex.solla at gmail.com>wrote:
>
>
> On Tue, Jul 26, 2011 at 1:30 PM, dokondr <dokondr at gmail.com> wrote:
>
>> Hi,
>> Can't find on hackage any sparse vector library. Does such thing exist?
>> I need efficient storage and dot product calculation for very sparse
>> vectors with about 10 out of 40 000 non-zero components.
>> One solution would be to represent Sparse Vector as Data.Map with
>> (component_index, component_value) pairs to store non-zero components of the
>> vector.
>>
>
> I would make a different suggestion:
>
> Store a Set (or maybe an IntMap) of (IntMap scalar)s.
>
> In other words, let each vector be represented by an IntMap whose key
> represents the n'th component of the vector, and whose value is the proper
> scalar.
>
>
>> In this case, for example, calculating cosine similarity (
>> http://en.wikipedia.org/wiki/Cosine_similarity) for for every pair of 10
>> 000 vectors, would not be very nice and efficient, I am afraid.
>>
>
> Given two (IntMap Double)s a and b, I would compute the projection of a
> along b as
>
> cosineSimilarity :: IntMap Double -> IntMap Double -> Double
> cosineSimilarity a b = (dot a b) / ((norm a) * (norm b)) where
> dot = sum . elems . intersectionWith (*)
> norm = (**0.5) . sum . fmap (**2) . elems
>
>
> The only part I find tricky is enumerating all 10000^2 pairs
>
> pairs :: Int -> [Int]
> pairs dim = do
> x <- [1..dim]
> y <- [1..dim]
> return (x,y)
>
> and computing the projection for each pair:
>
> projections :: Floating scalar => IntMap (IntMap scalar) -> Map (Int, Int)
> scalar
> projections space = let dimensions = undefined -- find max key in elements
> of space?
> m_projection space x y = cosineSimilarity <$>
> lookup x space
> <*>
> lookup y space
> in fromList . filter (isMaybe . snd)
> . fmap (\(x,y) -> ((x,y), m_projection
> space x y)
> . pairs
> $ dimensions
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110727/84703b9e/attachment.htm>
More information about the Haskell-Cafe
mailing list