Proposal for Data.Graph : Improve graph creation complexity when nodes have "consecutive" keys

Tue Apr 3 07:42:40 UTC 2018

Hello,

While we began discussing Proposal II with Ivan Lazar Miljenovic, I was
expecting first to discuss Proposal I, as my short-term goal is to get the
PR associated with Proposal I to be merged : https://github.com/haskell/
containers/pull/549

So please bring forward any remarks / concerns / approvals, for Proposal I
in priority!

Thank you,

Olivier

2018-04-02 13:30 GMT+02:00 Olivier S. <olivier.sohn at gmail.com>:

>
> Hello,
>
> I'm resending this proposal, which is simplified w.r.t the first one, and
> where I removed a wrong analysis of a benchmark.
>
> - Proposal I : Optimize the time complexity of (key -> Maybe Vertex)
> lookups and graph creation when keys are Integral and consecutive.
>
> (The related PR for proposal I is [1], including benchmarks showing the
> performance improvements.)
>
> Currently, (key -> Maybe Vertex) lookups returned by graphFromEdges
> consist of a binary search on an array, with a time complexity of O(log V)
> (I will use V for "Count of vertices", E for "Count of edges").
>
> When key is Integral, and keys (of nodes passed to the graph creation
> function) form a set of /consecutive/ values (for example : [4,5,6,7] or
> [5,6,4,7]), we can have an O(1) lookup by substracting the value of the
> smallest key, and checking for bounds.
>
> Hence, graph creation complexity is improved, and user algorithms using
> (key -> Maybe Vertex) lookups will see their complexity reduced by a factor
> of up-to O(log V).
>
> The PR introduces this lookup and uses it for functions
> graphFromEdgesWithConsecutiveKeys and graphFromEdgesWithConsecutiveA
> scKeys.
>
> Here is a summary of complexities for (graph creation, lookup function) as
> they stand in the current state of the PR:
>
> - graphFromEdges (the currently existing function):
> O( (V+E) * log V ), O(log V)
> - graphFromEdgesWithConsecutiveKeys (new function):
> O( E + (V*log V) ), O(1)
> - graphFromEdgesWithConsecutiveAscKeys (new function) :
> O( V+E ), O(1)
>
> - Proposal II : Deprecate `graphFromEdges` taking [(node, key, [key])] in
> favor of `graphFromMap` taking (Map key (node,[key]))
>
> If we pass the same key twice in the list we pass to 'graphFromEdges' it
> is undefined which node for that key will actually be used.
> Introducing 'graphFromMap', taking a (Map key (node,[key]) would alleviate
> this issue, through the type used.
>
> Also, using a Map makes the implementation a bit more "natural" : there is
> no need for sorting by key, as Map.toAscList gives exactly the sorted list
> we want.
>
> We could also deprecate graphFromEdgesWithConsecutiveKeys and
> graphFromEdgesWithConsecutiveAscKeys (introduced in proposal I) in favor
> of graphFromConsecutiveMap.
>
> About the naming, I propose two different schemes:
>
> Either:
>     - graphFromEdges                 (takes a List, deprecated, existing
> function)
>     - graphFromEdgesInMap            (takes a Map)
>     - graphFromEdgesInConsecutiveMap (takes a Map with consecutive keys)
> Or:
>     - graphFromEdges                 (takes a List, deprecated, existing
> function)
>     - graphFromMap
>     - graphFromConsecutiveMap
>  with these, to reflect the Map / List duality in the naming scheme:
>     - graphFromList               (takes a List, deprecated, redirects to
> graphFromEdges)
>     - graphFromConsecutiveList    (takes a List, deprecated, redirects to
> graphFromEdgesWithConsecutiveKeys)
>     - graphFromConsecutiveAscList (takes a List, deprecated, redirects to
> graphFromEdgesWithConsecutiveAscKeys)
>
> Cheers,
> Olivier Sohn
>
> [1] https://github.com/haskell/containers/pull/549
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/libraries/attachments/20180403/a0b83998/attachment.html>