<html><head></head><body>There's no benefit to optimizing default Foldable methods for data types that already have specialized methods.<br><br>foldMap' does not care about the nesting of the structure like foldl', so it's a better default choice.<br><br>What I worry more about is that getSum . foldl' (\ z x -> z <> Sum x) mempty is compiling to different code than foldl' (+) 0.<br>—<br>Sent from my phone with K-9 Mail.<br><br><div class="gmail_quote">On December 23, 2020 9:52:00 PM UTC, Viktor Dukhovni <ietf-dane@dukhovni.org> wrote:<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<pre dir="auto" class="k9mail">On Wed, Dec 23, 2020 at 02:08:51PM +0100, Merijn Verstraaten wrote:<br><br><blockquote class="gmail_quote" style="margin: 0pt 0pt 1ex 0.8ex; border-left: 1px solid #729fcf; padding-left: 1ex;">Also, note that the benchmark from the original email:<br>1) uses ghci, rendering it meaningless as it's not remotely<br>representative of performance of compiled code<br></blockquote><br>While indeed List does not use the default definition, when that<br>definition is used for List, the results with optimised compiled code<br>are even more stark. When summing [0..1_000_000_000] using the<br>foldMap'-based definition of `sum`:<br><br> main :: IO ()<br> main = getArgs >>= \case<br> [n] -> go (read n)<br> _ -> go 1_000_000_000<br> where<br> go :: Int -> IO ()<br> go n = print $ getSum #. F.foldMap' Sum $ [0..n]<br><br>The list elements end up allocated on the heap and I get:<br><br> $ ./sum1 +RTS -s<br> 500000000500000000<br> 72,000,051,888 bytes allocated in the heap<br> 571,120 bytes copied during GC<br> 44,376 bytes maximum residency (2 sample(s))<br> 29,352 bytes maximum slop<br> 5 MiB total memory in use (0 MB lost due to fragmentation)<br><br> INIT time 0.000s ( 0.000s elapsed)<br> MUT time 6.996s ( 6.988s elapsed)<br> GC time 0.036s ( 0.043s elapsed)<br> EXIT time 0.000s ( 0.002s elapsed)<br> Total time 7.033s ( 7.034s elapsed)<br><br>while with the List instance of foldl':<br><br> main :: IO ()<br> main = getArgs >>= \case<br> [n] -> go (read n)<br> _ -> go 1_000_000_000<br> where<br> go :: Int -> IO ()<br> go n = print $ F.foldl' (+) 0 $ [0..n]<br><br>the computation avoids heap allocation:<br><br> $ ./sum2 +RTS -s<br> 500000000500000000<br> 51,816 bytes allocated in the heap<br> 3,320 bytes copied during GC<br> 44,376 bytes maximum residency (1 sample(s))<br> 25,256 bytes maximum slop<br> 5 MiB total memory in use (0 MB lost due to fragmentation)<br><br> INIT time 0.000s ( 0.000s elapsed)<br> MUT time 0.346s ( 0.346s elapsed)<br> GC time 0.001s ( 0.001s elapsed)<br> EXIT time 0.000s ( 0.004s elapsed)<br> Total time 0.347s ( 0.351s elapsed)<br><br><blockquote class="gmail_quote" style="margin: 0pt 0pt 1ex 0.8ex; border-left: 1px solid #729fcf; padding-left: 1ex;">2) uses the list instance of Foldable which does *not* use the<br>defaults shown in the original email, but uses explicit definitions.<br></blockquote><br>Yes, the "List" instance of `sum` does not use the default definition.<br>That instance shows comparable performance for `sum` and `foldl'` when<br>compiled optimised.<br><br><blockquote class="gmail_quote" style="margin: 0pt 0pt 1ex 0.8ex; border-left: 1px solid #729fcf; padding-left: 1ex;">The foldMap' code shown (but not used!) *is* strict, so would,<br>presumably perform comparably to foldl'. Unlike the foldl version of<br>sum that Foldable for lists currently uses.<br></blockquote><br>So my question is basically whether the default is *generally* the more<br>appropriate choice. It clearly is not the more efficient choice for<br>Lists (more precisely, lazily generated iterators).<br><br>The difference mostly goes away when the data structure in question is<br>already fully realised in memory (as with e.g. strict maps, ...)<br><br>But I am skeptical that the `foldMap'` defintion is a better default,<br>are there real cases where it is actually better?<br></pre></blockquote></div></body></html>