Proposal for Data.Graph : Improve graph creation complexity when nodes have "consecutive" keys
Ivan Lazar Miljenovic
ivan.miljenovic at gmail.com
Mon Apr 2 22:18:32 UTC 2018
(Sending this back to the libraries@ list as well.)
On 2 April 2018 at 23:59, Olivier S. <olivier.sohn at gmail.com> wrote:
>
> 2018-04-02 14:34 GMT+02:00 Ivan Lazar Miljenovic
> <ivan.miljenovic at gmail.com>:
>>
>> On 2 April 2018 at 21:30, Olivier S. <olivier.sohn at gmail.com> wrote:
>> >
>> > Hello,
>> >
>> > I'm resending this proposal, which is simplified w.r.t the first one,
>> > and
>> > where I removed a wrong analysis of a benchmark.
>> >
>> > - Proposal I : Optimize the time complexity of (key -> Maybe Vertex)
>> > lookups
>> > and graph creation when keys are Integral and consecutive.
>> >
>> > (The related PR for proposal I is [1], including benchmarks showing the
>> > performance improvements.)
>> >
>> > Currently, (key -> Maybe Vertex) lookups returned by graphFromEdges
>> > consist
>> > of a binary search on an array, with a time complexity of O(log V) (I
>> > will
>> > use V for "Count of vertices", E for "Count of edges").
>>
>> At the risk of bikeshedding, can you please use |V| and |E| to refer
>> to the order and size of the graph respectively?
>
>
> Do you mean in the haddock documentation for complexities or here? I don't
> know which is mor readable, O( (V+E) * log V ) or O( (|V|+|E|) * log |V| ).
> Anyway it would be a quick change in the PR, I'm not particularly attached
> to the notation.
Both. |V| and |E| are more standard for this, as V and E represent
the vertices and edges themselves.
>> > When key is Integral, and keys (of nodes passed to the graph creation
>> > function) form a set of /consecutive/ values (for example : [4,5,6,7] or
>> > [5,6,4,7]), we can have an O(1) lookup by substracting the value of the
>> > smallest key, and checking for bounds.
>>
>> I'm not sure I follow this part; are you ignoring order in these lists
>> (you're referring to sets but using list notation)?
>>
>
> I'm not ignoring the order, let me try to give a more precise definition:
>
> keys is a list of consecutive keys iff it verifies:
>
> -- (1) keys contains no duplicates
> Set.size (Set.fromList keys) == length keys
>
> -- (2) there is no "gap" between values, when sorted:
> sort keys == [minimum keys .. maximum keys]
>
>
> The O(1) lookup is at line 516 of Data/Graph.hs in
> https://github.com/haskell/containers/pull/549/files (key_vertex)
>
>> >
>> > Hence, graph creation complexity is improved, and user algorithms using
>> > (key
>> > -> Maybe Vertex) lookups will see their complexity reduced by a factor
>> > of
>> > up-to O(log V).
>> >
>> > The PR introduces this lookup and uses it for functions
>> > graphFromEdgesWithConsecutiveKeys and
>> > graphFromEdgesWithConsecutiveAscKeys.
>> >
>> > Here is a summary of complexities for (graph creation, lookup function)
>> > as
>> > they stand in the current state of the PR:
>> >
>> > - graphFromEdges (the currently existing function):
>> > O( (V+E) * log V ), O(log V)
>> > - graphFromEdgesWithConsecutiveKeys (new function):
>> > O( E + (V*log V) ), O(1)
>> > - graphFromEdgesWithConsecutiveAscKeys (new function) :
>> > O( V+E ), O(1)
>> >
>> > - Proposal II : Deprecate `graphFromEdges` taking [(node, key, [key])]
>> > in
>> > favor of `graphFromMap` taking (Map key (node,[key]))
>> >
>> > If we pass the same key twice in the list we pass to 'graphFromEdges' it
>> > is
>> > undefined which node for that key will actually be used.
>> > Introducing 'graphFromMap', taking a (Map key (node,[key]) would
>> > alleviate
>> > this issue, through the type used.
>>
>> Off the top of my head, I'm not a big fan of this. If we're going to
>> improve this, then I'd prefer to do so in such a way that allowed for
>> usage with IntMap
>
>
> Yes, IntMap seems to be better wrt performances than Map. Quoting the doc of
> IntMap:
>
> This data structure performs especially well on binary operations like union
> and intersection. However, my benchmarks show that it is also (much) faster
> on insertions and deletions when compared to a generic size-balanced map
> implementation (see Data.Map).
>
>
>>
>> (is there an existing type-class that covers
>> association list-style data structures?).
>
>
>
> There is the Map type-class (which I just discovered) in :
>
> https://hackage.haskell.org/package/collections-api-1.0.0.0/docs/Data-Collections.html#g:4
Except that it's in another library ;-)
> With instances defined here, but only for Lazy versions Data.Map and
> Data.IntMap:
Note that the data structures for the Lazy and Strict variants of
[Int]Map are the same, it's just the strictness of the functions that
operate on them that differ.
>
> https://hackage.haskell.org/package/collections-base-instances-1.0.0.0/docs/Data-Collections-BaseInstances.html
>
> Also, the type-class class doesn't have toAscList (or toList) functions
> (which is what we would use in the implementation).
>
> So if we want to rely on this we would need to implement toAscList, and
> probably add instances for Strict maps (Data.IntMap.Strict, Data.Map.Strict)
>
>>
>> Ideally you could also use
>> HashMap from unordered-containers as well, but since we ultimately
>> want `type Vertex = Int` I'm not sure if that's worth it; IntMap,
>> however, is.
>>
>
> I see another problem with HashMap : it doesn't provide a toAscList function
> where the keys are sorted, so we would have to sort them, incurring a fixed
> O(V log V) cost, whereas with Map and IntMap the user has the possibility to
> create the map from an ascending list (fromAscList), in O(V) time and we can
> get the list back also (toAscList) in O(V) time.
>
>> >
>> > Also, using a Map makes the implementation a bit more "natural" : there
>> > is
>> > no need for sorting by key, as Map.toAscList gives exactly the sorted
>> > list
>> > we want.
>> >
>> > We could also deprecate graphFromEdgesWithConsecutiveKeys and
>> > graphFromEdgesWithConsecutiveAscKeys (introduced in proposal I) in favor
>> > of
>> > graphFromConsecutiveMap.
>> >
>> > About the naming, I propose two different schemes:
>> >
>> > Either:
>> > - graphFromEdges (takes a List, deprecated, existing
>> > function)
>> > - graphFromEdgesInMap (takes a Map)
>> > - graphFromEdgesInConsecutiveMap (takes a Map with consecutive keys)
>> > Or:
>> > - graphFromEdges (takes a List, deprecated, existing
>> > function)
>> > - graphFromMap
>> > - graphFromConsecutiveMap
>> > with these, to reflect the Map / List duality in the naming scheme:
>> > - graphFromList (takes a List, deprecated, redirects
>> > to
>> > graphFromEdges)
>> > - graphFromConsecutiveList (takes a List, deprecated, redirects
>> > to
>> > graphFromEdgesWithConsecutiveKeys)
>> > - graphFromConsecutiveAscList (takes a List, deprecated, redirects
>> > to
>> > graphFromEdgesWithConsecutiveAscKeys)
>> >
>> > Cheers,
>> > Olivier Sohn
>> >
>> > [1] https://github.com/haskell/containers/pull/549
>> >
>> >
>> > _______________________________________________
>> > Libraries mailing list
>> > Libraries at haskell.org
>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>> >
>>
>>
>>
>> --
>> Ivan Lazar Miljenovic
>> Ivan.Miljenovic at gmail.com
>> http://IvanMiljenovic.wordpress.com
>
>
--
Ivan Lazar Miljenovic
Ivan.Miljenovic at gmail.com
http://IvanMiljenovic.wordpress.com
More information about the Libraries
mailing list