<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">2018-04-03 0:18 GMT+02:00 Ivan Lazar Miljenovic <span dir="ltr"><<a href="mailto:ivan.miljenovic@gmail.com" target="_blank">ivan.miljenovic@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">(Sending this back to the libraries@ list as well.)<br>

<span class=""><br>

On 2 April 2018 at 23:59, Olivier S. <<a href="mailto:olivier.sohn@gmail.com">olivier.sohn@gmail.com</a>> wrote:<br>

><br>

> 2018-04-02 14:34 GMT+02:00 Ivan Lazar Miljenovic<br>

> <<a href="mailto:ivan.miljenovic@gmail.com">ivan.miljenovic@gmail.com</a>>:<br>

>><br>

>> On 2 April 2018 at 21:30, Olivier S. <<a href="mailto:olivier.sohn@gmail.com">olivier.sohn@gmail.com</a>> wrote:<br>

>> ><br>

>> > Hello,<br>

>> ><br>

>> > I'm resending this proposal, which is simplified w.r.t the first one,<br>

>> > and<br>

>> > where I removed a wrong analysis of a benchmark.<br>

>> ><br>

>> > - Proposal I : Optimize the time complexity of (key -> Maybe Vertex)<br>

>> > lookups<br>

>> > and graph creation when keys are Integral and consecutive.<br>

>> ><br>

>> > (The related PR for proposal I is [1], including benchmarks showing the<br>

>> > performance improvements.)<br>

>> ><br>

>> > Currently, (key -> Maybe Vertex) lookups returned by graphFromEdges<br>

>> > consist<br>

>> > of a binary search on an array, with a time complexity of O(log V) (I<br>

>> > will<br>

>> > use V for "Count of vertices", E for "Count of edges").<br>

>><br>

>> At the risk of bikeshedding, can you please use |V| and |E| to refer<br>

>> to the order and size of the graph respectively?<br>

><br>

><br>

> Do you mean in the haddock documentation for complexities or here? I don't<br>

> know which is mor readable, O( (V+E) * log V ) or O( (|V|+|E|) * log |V| ).<br>

> Anyway it would be a quick change in the PR, I'm not particularly attached<br>

> to the notation.<br>

<br>

</span>Both.  |V| and |E| are more standard for this, as V and E represent<br>

the vertices and edges themselves.<br>

<div><div class="h5"><br>

>> > When key is Integral, and keys (of nodes passed to the graph creation<br>

>> > function) form a set of /consecutive/ values (for example : [4,5,6,7] or<br>

>> > [5,6,4,7]), we can have an O(1) lookup by substracting the value of the<br>

>> > smallest key, and checking for bounds.<br>

>><br>

>> I'm not sure I follow this part; are you ignoring order in these lists<br>

>> (you're referring to sets but using list notation)?<br>

>><br>

><br>

> I'm not ignoring the order, let me try to give a more precise definition:<br>

><br>

> keys is a list of consecutive keys iff it verifies:<br>

><br>

> -- (1) keys contains no duplicates<br>

> Set.size (Set.fromList keys) == length keys<br>

><br>

> -- (2) there is no "gap" between values, when sorted:<br>

> sort keys == [minimum keys .. maximum keys]<br>

><br>

><br>

> The O(1) lookup is at line 516 of Data/Graph.hs in<br>

> <a href="https://github.com/haskell/containers/pull/549/files" rel="noreferrer" target="_blank">https://github.com/haskell/<wbr>containers/pull/549/files</a> (key_vertex)<br>

><br>

>> ><br>

>> > Hence, graph creation complexity is improved, and user algorithms using<br>

>> > (key<br>

>> > -> Maybe Vertex) lookups will see their complexity reduced by a factor<br>

>> > of<br>

>> > up-to O(log V).<br>

>> ><br>

>> > The PR introduces this lookup and uses it for functions<br>

>> > graphFromEdgesWithConsecutiveK<wbr>eys and<br>

>> > graphFromEdgesWithConsecutiveA<wbr>scKeys.<br>

>> ><br>

>> > Here is a summary of complexities for (graph creation, lookup function)<br>

>> > as<br>

>> > they stand in the current state of the PR:<br>

>> ><br>

>> > - graphFromEdges (the currently existing function):<br>

>> > O( (V+E) * log V ), O(log V)<br>

>> > - graphFromEdgesWithConsecutiveK<wbr>eys (new function):<br>

>> > O( E + (V*log V) ), O(1)<br>

>> > - graphFromEdgesWithConsecutiveA<wbr>scKeys (new function) :<br>

>> > O( V+E ), O(1)<br>

>> ><br>

>> > - Proposal II : Deprecate `graphFromEdges` taking [(node, key, [key])]<br>

>> > in<br>

>> > favor of `graphFromMap` taking (Map key (node,[key]))<br>

>> ><br>

>> > If we pass the same key twice in the list we pass to 'graphFromEdges' it<br>

>> > is<br>

>> > undefined which node for that key will actually be used.<br>

>> > Introducing 'graphFromMap', taking a (Map key (node,[key]) would<br>

>> > alleviate<br>

>> > this issue, through the type used.<br>

>><br>

>> Off the top of my head, I'm not a big fan of this.  If we're going to<br>

>> improve this, then I'd prefer to do so in such a way that allowed for<br>

>> usage with IntMap<br>

><br>

><br>

> Yes, IntMap seems to be better wrt performances than Map. Quoting the doc of<br>

> IntMap:<br>

><br>

> This data structure performs especially well on binary operations like union<br>

> and intersection. However, my benchmarks show that it is also (much) faster<br>

> on insertions and deletions when compared to a generic size-balanced map<br>

> implementation (see Data.Map).<br>

><br>

><br>

>><br>

>> (is there an existing type-class that covers<br>

>> association list-style data structures?).<br>

><br>

><br>

><br>

> There is the Map type-class (which I just discovered) in :<br>

><br>

> <a href="https://hackage.haskell.org/package/collections-api-1.0.0.0/docs/Data-Collections.html#g:4" rel="noreferrer" target="_blank">https://hackage.haskell.org/<wbr>package/collections-api-1.0.0.<wbr>0/docs/Data-Collections.html#<wbr>g:4</a><br>

<br>

</div></div>Except that it's in another library ;-)<br>

<span class=""><br></span></blockquote><div><br></div><div>So it seems using Data.IntMap would be a good compromise?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">

> With instances defined here, but only for Lazy versions Data.Map and<br>

> Data.IntMap:<br>

<br>

</span>Note that the data structures for the Lazy and Strict variants of<br>

[Int]Map are the same, it's just the strictness of the functions that<br>

operate on them that differ.<br></blockquote><div><br></div><div>That's interesting, I wasn't aware of this. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div class="HOEnZb"><div class="h5"><br>

><br>

> <a href="https://hackage.haskell.org/package/collections-base-instances-1.0.0.0/docs/Data-Collections-BaseInstances.html" rel="noreferrer" target="_blank">https://hackage.haskell.org/<wbr>package/collections-base-<wbr>instances-1.0.0.0/docs/Data-<wbr>Collections-BaseInstances.html</a><br>

><br>

> Also, the type-class class doesn't have toAscList (or toList) functions<br>

> (which is what we would use in the implementation).<br>

><br>

> So if we want to rely on this we would need to implement toAscList, and<br>

> probably add instances for Strict maps (Data.IntMap.Strict, Data.Map.Strict)<br>

><br>

>><br>

>>   Ideally you could also use<br>

>> HashMap from unordered-containers as well, but since we ultimately<br>

>> want `type Vertex = Int` I'm not sure if that's worth it; IntMap,<br>

>> however, is.<br>

>><br>

><br>

> I see another problem with HashMap : it doesn't provide a toAscList function<br>

> where the keys are sorted, so we would have to sort them, incurring a fixed<br>

> O(V log V) cost, whereas with Map and IntMap the user has the possibility to<br>

> create the map from an ascending list (fromAscList), in O(V) time and we can<br>

> get the list back also (toAscList) in O(V) time.<br>

><br>

>> ><br>

>> > Also, using a Map makes the implementation a bit more "natural" : there<br>

>> > is<br>

>> > no need for sorting by key, as Map.toAscList gives exactly the sorted<br>

>> > list<br>

>> > we want.<br>

>> ><br>

>> > We could also deprecate graphFromEdgesWithConsecutiveK<wbr>eys and<br>

>> > graphFromEdgesWithConsecutiveA<wbr>scKeys (introduced in proposal I) in favor<br>

>> > of<br>

>> > graphFromConsecutiveMap.<br>

>> ><br>

>> > About the naming, I propose two different schemes:<br>

>> ><br>

>> > Either:<br>

>> >     - graphFromEdges                 (takes a List, deprecated, existing<br>

>> > function)<br>

>> >     - graphFromEdgesInMap            (takes a Map)<br>

>> >     - graphFromEdgesInConsecutiveMap (takes a Map with consecutive keys)<br>

>> > Or:<br>

>> >     - graphFromEdges                 (takes a List, deprecated, existing<br>

>> > function)<br>

>> >     - graphFromMap<br>

>> >     - graphFromConsecutiveMap<br>

>> >  with these, to reflect the Map / List duality in the naming scheme:<br>

>> >     - graphFromList               (takes a List, deprecated, redirects<br>

>> > to<br>

>> > graphFromEdges)<br>

>> >     - graphFromConsecutiveList    (takes a List, deprecated, redirects<br>

>> > to<br>

>> > graphFromEdgesWithConsecutiveK<wbr>eys)<br>

>> >     - graphFromConsecutiveAscList (takes a List, deprecated, redirects<br>

>> > to<br>

>> > graphFromEdgesWithConsecutiveA<wbr>scKeys)<br>

>> ><br>

>> > Cheers,<br>

>> > Olivier Sohn<br>

>> ><br>

>> > [1] <a href="https://github.com/haskell/containers/pull/549" rel="noreferrer" target="_blank">https://github.com/haskell/<wbr>containers/pull/549</a><br>

>> ><br>

>> ><br>

>> > ______________________________<wbr>_________________<br>

>> > Libraries mailing list<br>

>> > <a href="mailto:Libraries@haskell.org">Libraries@haskell.org</a><br>

>> > <a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries" rel="noreferrer" target="_blank">http://mail.haskell.org/cgi-<wbr>bin/mailman/listinfo/libraries</a><br>

>> ><br>

>><br>

>><br>

>><br>

>> --<br>

>> Ivan Lazar Miljenovic<br>

>> <a href="mailto:Ivan.Miljenovic@gmail.com">Ivan.Miljenovic@gmail.com</a><br>

>> <a href="http://IvanMiljenovic.wordpress.com" rel="noreferrer" target="_blank">http://IvanMiljenovic.<wbr>wordpress.com</a><br>

><br>

><br>

<br>

<br>

<br>

--<br>

Ivan Lazar Miljenovic<br>

<a href="mailto:Ivan.Miljenovic@gmail.com">Ivan.Miljenovic@gmail.com</a><br>

<a href="http://IvanMiljenovic.wordpress.com" rel="noreferrer" target="_blank">http://IvanMiljenovic.<wbr>wordpress.com</a><br>

</div></div></blockquote></div><br></div></div>