[Haskell-cafe] Library for Sparse Vectors?

dokondr dokondr at gmail.com
Wed Jul 27 00:27:23 CEST 2011


Thanks for the detailed reply and example!
Using IntMap as a vector seems to be a good idea.
In your example:
1)  I would use:

       dot = dot' * dot'
       dot'  = sum . elems . intersectionWith (*)
       norm = sum . fmap (**2) . elems

 instead of:

        dot  = sum . elems . intersectionWith (*)
        norm = (**0.5) . sum . fmap (**2) . elems

2)  I don't understand the syntax:
cosineSimilarity <$> lookup x space
                 <*> lookup y space

What are <$> and <*>?

Thanks,
Dmitri

On Wed, Jul 27, 2011 at 1:46 AM, Alexander Solla <alex.solla at gmail.com>wrote:

>
>
> On Tue, Jul 26, 2011 at 1:30 PM, dokondr <dokondr at gmail.com> wrote:
>
>> Hi,
>> Can't find on hackage any sparse vector library. Does such thing exist?
>> I need efficient storage and dot product calculation for very sparse
>> vectors with about 10 out of 40 000 non-zero components.
>> One solution would be to represent Sparse Vector as Data.Map with
>> (component_index, component_value) pairs to store non-zero components of the
>> vector.
>>
>
> I would make a different suggestion:
>
> Store a Set (or maybe an IntMap) of (IntMap scalar)s.
>
> In other words, let each vector be represented by an IntMap whose key
> represents the n'th component of the vector, and whose value is the proper
> scalar.
>
>
>> In this case, for example, calculating cosine similarity (
>> http://en.wikipedia.org/wiki/Cosine_similarity) for for every pair of 10
>> 000 vectors, would not be very nice and efficient, I am afraid.
>>
>
> Given two (IntMap Double)s a and b, I would compute the projection of a
> along b as
>
> cosineSimilarity :: IntMap Double -> IntMap Double -> Double
> cosineSimilarity a b  = (dot a b) / ((norm a) * (norm b)) where
>                  dot  = sum . elems . intersectionWith (*)
>                  norm = (**0.5) . sum . fmap (**2) . elems
>
>
> The only part I find tricky is enumerating all 10000^2 pairs
>
> pairs :: Int -> [Int]
> pairs dim = do
>    x <- [1..dim]
>    y <- [1..dim]
>    return (x,y)
>
> and computing the projection for each pair:
>
> projections :: Floating scalar => IntMap (IntMap scalar) -> Map (Int, Int)
> scalar
> projections space = let dimensions = undefined -- find max key in elements
> of space?
>                         m_projection space x y = cosineSimilarity <$>
> lookup x space
>                                                                   <*>
> lookup y space
>                      in fromList . filter (isMaybe . snd)
>                                  . fmap (\(x,y) -> ((x,y), m_projection
> space x y)
>                                  . pairs
>                                  $ dimensions
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110727/84703b9e/attachment.htm>


More information about the Haskell-Cafe mailing list