<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">2018-04-03 0:18 GMT+02:00 Ivan Lazar Miljenovic <span dir="ltr"><<a href="mailto:ivan.miljenovic@gmail.com" target="_blank">ivan.miljenovic@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">(Sending this back to the libraries@ list as well.)<br>
<span class=""><br>
On 2 April 2018 at 23:59, Olivier S. <<a href="mailto:olivier.sohn@gmail.com">olivier.sohn@gmail.com</a>> wrote:<br>
><br>
> 2018-04-02 14:34 GMT+02:00 Ivan Lazar Miljenovic<br>
> <<a href="mailto:ivan.miljenovic@gmail.com">ivan.miljenovic@gmail.com</a>>:<br>
>><br>
>> On 2 April 2018 at 21:30, Olivier S. <<a href="mailto:olivier.sohn@gmail.com">olivier.sohn@gmail.com</a>> wrote:<br>
>> ><br>
>> > Hello,<br>
>> ><br>
>> > I'm resending this proposal, which is simplified w.r.t the first one,<br>
>> > and<br>
>> > where I removed a wrong analysis of a benchmark.<br>
>> ><br>
>> > - Proposal I : Optimize the time complexity of (key -> Maybe Vertex)<br>
>> > lookups<br>
>> > and graph creation when keys are Integral and consecutive.<br>
>> ><br>
>> > (The related PR for proposal I is [1], including benchmarks showing the<br>
>> > performance improvements.)<br>
>> ><br>
>> > Currently, (key -> Maybe Vertex) lookups returned by graphFromEdges<br>
>> > consist<br>
>> > of a binary search on an array, with a time complexity of O(log V) (I<br>
>> > will<br>
>> > use V for "Count of vertices", E for "Count of edges").<br>
>><br>
>> At the risk of bikeshedding, can you please use |V| and |E| to refer<br>
>> to the order and size of the graph respectively?<br>
><br>
><br>
> Do you mean in the haddock documentation for complexities or here? I don't<br>
> know which is mor readable, O( (V+E) * log V ) or O( (|V|+|E|) * log |V| ).<br>
> Anyway it would be a quick change in the PR, I'm not particularly attached<br>
> to the notation.<br>
<br>
</span>Both. |V| and |E| are more standard for this, as V and E represent<br>
the vertices and edges themselves.<br>
<div><div class="h5"><br>
>> > When key is Integral, and keys (of nodes passed to the graph creation<br>
>> > function) form a set of /consecutive/ values (for example : [4,5,6,7] or<br>
>> > [5,6,4,7]), we can have an O(1) lookup by substracting the value of the<br>
>> > smallest key, and checking for bounds.<br>
>><br>
>> I'm not sure I follow this part; are you ignoring order in these lists<br>
>> (you're referring to sets but using list notation)?<br>
>><br>
><br>
> I'm not ignoring the order, let me try to give a more precise definition:<br>
><br>
> keys is a list of consecutive keys iff it verifies:<br>
><br>
> -- (1) keys contains no duplicates<br>
> Set.size (Set.fromList keys) == length keys<br>
><br>
> -- (2) there is no "gap" between values, when sorted:<br>
> sort keys == [minimum keys .. maximum keys]<br>
><br>
><br>
> The O(1) lookup is at line 516 of Data/Graph.hs in<br>
> <a href="https://github.com/haskell/containers/pull/549/files" rel="noreferrer" target="_blank">https://github.com/haskell/<wbr>containers/pull/549/files</a> (key_vertex)<br>
><br>
>> ><br>
>> > Hence, graph creation complexity is improved, and user algorithms using<br>
>> > (key<br>
>> > -> Maybe Vertex) lookups will see their complexity reduced by a factor<br>
>> > of<br>
>> > up-to O(log V).<br>
>> ><br>
>> > The PR introduces this lookup and uses it for functions<br>
>> > graphFromEdgesWithConsecutiveK<wbr>eys and<br>
>> > graphFromEdgesWithConsecutiveA<wbr>scKeys.<br>
>> ><br>
>> > Here is a summary of complexities for (graph creation, lookup function)<br>
>> > as<br>
>> > they stand in the current state of the PR:<br>
>> ><br>
>> > - graphFromEdges (the currently existing function):<br>
>> > O( (V+E) * log V ), O(log V)<br>
>> > - graphFromEdgesWithConsecutiveK<wbr>eys (new function):<br>
>> > O( E + (V*log V) ), O(1)<br>
>> > - graphFromEdgesWithConsecutiveA<wbr>scKeys (new function) :<br>
>> > O( V+E ), O(1)<br>
>> ><br>
>> > - Proposal II : Deprecate `graphFromEdges` taking [(node, key, [key])]<br>
>> > in<br>
>> > favor of `graphFromMap` taking (Map key (node,[key]))<br>
>> ><br>
>> > If we pass the same key twice in the list we pass to 'graphFromEdges' it<br>
>> > is<br>
>> > undefined which node for that key will actually be used.<br>
>> > Introducing 'graphFromMap', taking a (Map key (node,[key]) would<br>
>> > alleviate<br>
>> > this issue, through the type used.<br>
>><br>
>> Off the top of my head, I'm not a big fan of this. If we're going to<br>
>> improve this, then I'd prefer to do so in such a way that allowed for<br>
>> usage with IntMap<br>
><br>
><br>
> Yes, IntMap seems to be better wrt performances than Map. Quoting the doc of<br>
> IntMap:<br>
><br>
> This data structure performs especially well on binary operations like union<br>
> and intersection. However, my benchmarks show that it is also (much) faster<br>
> on insertions and deletions when compared to a generic size-balanced map<br>
> implementation (see Data.Map).<br>
><br>
><br>
>><br>
>> (is there an existing type-class that covers<br>
>> association list-style data structures?).<br>
><br>
><br>
><br>
> There is the Map type-class (which I just discovered) in :<br>
><br>
> <a href="https://hackage.haskell.org/package/collections-api-1.0.0.0/docs/Data-Collections.html#g:4" rel="noreferrer" target="_blank">https://hackage.haskell.org/<wbr>package/collections-api-1.0.0.<wbr>0/docs/Data-Collections.html#<wbr>g:4</a><br>
<br>
</div></div>Except that it's in another library ;-)<br>
<span class=""><br></span></blockquote><div><br></div><div>So it seems using Data.IntMap would be a good compromise?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">
> With instances defined here, but only for Lazy versions Data.Map and<br>
> Data.IntMap:<br>
<br>
</span>Note that the data structures for the Lazy and Strict variants of<br>
[Int]Map are the same, it's just the strictness of the functions that<br>
operate on them that differ.<br></blockquote><div><br></div><div>That's interesting, I wasn't aware of this. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="HOEnZb"><div class="h5"><br>
><br>
> <a href="https://hackage.haskell.org/package/collections-base-instances-1.0.0.0/docs/Data-Collections-BaseInstances.html" rel="noreferrer" target="_blank">https://hackage.haskell.org/<wbr>package/collections-base-<wbr>instances-1.0.0.0/docs/Data-<wbr>Collections-BaseInstances.html</a><br>
><br>
> Also, the type-class class doesn't have toAscList (or toList) functions<br>
> (which is what we would use in the implementation).<br>
><br>
> So if we want to rely on this we would need to implement toAscList, and<br>
> probably add instances for Strict maps (Data.IntMap.Strict, Data.Map.Strict)<br>
><br>
>><br>
>> Ideally you could also use<br>
>> HashMap from unordered-containers as well, but since we ultimately<br>
>> want `type Vertex = Int` I'm not sure if that's worth it; IntMap,<br>
>> however, is.<br>
>><br>
><br>
> I see another problem with HashMap : it doesn't provide a toAscList function<br>
> where the keys are sorted, so we would have to sort them, incurring a fixed<br>
> O(V log V) cost, whereas with Map and IntMap the user has the possibility to<br>
> create the map from an ascending list (fromAscList), in O(V) time and we can<br>
> get the list back also (toAscList) in O(V) time.<br>
><br>
>> ><br>
>> > Also, using a Map makes the implementation a bit more "natural" : there<br>
>> > is<br>
>> > no need for sorting by key, as Map.toAscList gives exactly the sorted<br>
>> > list<br>
>> > we want.<br>
>> ><br>
>> > We could also deprecate graphFromEdgesWithConsecutiveK<wbr>eys and<br>
>> > graphFromEdgesWithConsecutiveA<wbr>scKeys (introduced in proposal I) in favor<br>
>> > of<br>
>> > graphFromConsecutiveMap.<br>
>> ><br>
>> > About the naming, I propose two different schemes:<br>
>> ><br>
>> > Either:<br>
>> > - graphFromEdges (takes a List, deprecated, existing<br>
>> > function)<br>
>> > - graphFromEdgesInMap (takes a Map)<br>
>> > - graphFromEdgesInConsecutiveMap (takes a Map with consecutive keys)<br>
>> > Or:<br>
>> > - graphFromEdges (takes a List, deprecated, existing<br>
>> > function)<br>
>> > - graphFromMap<br>
>> > - graphFromConsecutiveMap<br>
>> > with these, to reflect the Map / List duality in the naming scheme:<br>
>> > - graphFromList (takes a List, deprecated, redirects<br>
>> > to<br>
>> > graphFromEdges)<br>
>> > - graphFromConsecutiveList (takes a List, deprecated, redirects<br>
>> > to<br>
>> > graphFromEdgesWithConsecutiveK<wbr>eys)<br>
>> > - graphFromConsecutiveAscList (takes a List, deprecated, redirects<br>
>> > to<br>
>> > graphFromEdgesWithConsecutiveA<wbr>scKeys)<br>
>> ><br>
>> > Cheers,<br>
>> > Olivier Sohn<br>
>> ><br>
>> > [1] <a href="https://github.com/haskell/containers/pull/549" rel="noreferrer" target="_blank">https://github.com/haskell/<wbr>containers/pull/549</a><br>
>> ><br>
>> ><br>
>> > ______________________________<wbr>_________________<br>
>> > Libraries mailing list<br>
>> > <a href="mailto:Libraries@haskell.org">Libraries@haskell.org</a><br>
>> > <a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries" rel="noreferrer" target="_blank">http://mail.haskell.org/cgi-<wbr>bin/mailman/listinfo/libraries</a><br>
>> ><br>
>><br>
>><br>
>><br>
>> --<br>
>> Ivan Lazar Miljenovic<br>
>> <a href="mailto:Ivan.Miljenovic@gmail.com">Ivan.Miljenovic@gmail.com</a><br>
>> <a href="http://IvanMiljenovic.wordpress.com" rel="noreferrer" target="_blank">http://IvanMiljenovic.<wbr>wordpress.com</a><br>
><br>
><br>
<br>
<br>
<br>
--<br>
Ivan Lazar Miljenovic<br>
<a href="mailto:Ivan.Miljenovic@gmail.com">Ivan.Miljenovic@gmail.com</a><br>
<a href="http://IvanMiljenovic.wordpress.com" rel="noreferrer" target="_blank">http://IvanMiljenovic.<wbr>wordpress.com</a><br>
</div></div></blockquote></div><br></div></div>