Potential space leak in transpose
David Feuer
david.feuer at gmail.com
Fri May 15 16:58:59 UTC 2020
Carter, imagine the list elements are thunks and that each of them, when
forced, produces a large structure (e.g., a vector of length 10^7). Does
that help your imagination?
On Fri, May 15, 2020, 12:41 PM Carter Schonwald <carter.schonwald at gmail.com>
wrote:
> David : call me unimaginative, but can you share an example that can
> trigger the space
> Leak ? At least conceptually I’m being a tad thick about it though I think
> I see what you’re pointing at.
>
> On Fri, May 15, 2020 at 4:24 AM Ryan Reich <ryan.reich at gmail.com> wrote:
>
>> Never mind, that one is broken. I should probably not write these late
>> in the evening.
>>
>> On Thu, May 14, 2020 at 10:30 PM Ryan Reich <ryan.reich at gmail.com> wrote:
>>
>>> So it doesn't, but this one does (*):
>>>
>>> ---
>>> transpose :: [[a]] -> [[a]]
>>> transpose [] = []
>>> transpose [r] = map (:[]) r
>>> transpose (r1 : rrest) = zip' r1 (transpose rrest)
>>> where
>>> zip' [] bs = bs
>>> zip' (a : as) ~(b : bs) = (a : b) : zip' as bs
>>>
>>> main = do
>>> mapM_ (print . transpose)
>>> [
>>> [[1,2,3]],
>>> [[1,2,3],[4,5,6]],
>>> [[1,2,3],[4,5,6],[7,8,9]],
>>> [[10,11],[20],[],[30,31,32]]
>>> ]
>>> print $ take 10 $ transpose [[0 ..]]
>>> --print $ map (take 10) $ transpose (repeat [0])
>>> print $ take 3 $ map (take 3) $ transpose (repeat [0..])
>>> ---
>>>
>>> It's true that this implementation will make the commented line
>>> diverge. *However*, (*) the stock implementation of transpose also
>>> diverges! Try it: head <$> transpose (repeat [0]) in ghci will print [0
>>> and then loop. So I think this one is okay by that standard. Does it do
>>> better at allocation than the existing one or the one with unzip?
>>>
>>> <rant, possibly beside the point>
>>> It's actually not even well-defined what the transpose would be if we
>>> allow infinitely many rows and also use the "condensing" property where the
>>> columns skip over nonexistent terms in the rows. How would you even prove
>>> that transpose (repeat [0]) has length 1? You'd have to rule out the
>>> existence of any second term in any of the infinitely many rows. Of
>>> course, in this example there is a fast way to do that, but the rows could
>>> be anything, so the problem is uncomputable. This logic is also why the
>>> final test above actually does terminate: because we find the requisite
>>> number of elements early on (in fact, instantly, since the rows are all
>>> infinite).
>>>
>>> This actually raises the question of whether transpose even "should"
>>> care about infinitely many rows or columns (I know, the behavior is
>>> standardized; pretend we're discussing a new alternative transpose' that
>>> explicitly only does finite lists).
>>> </rant>
>>>
>>> Ryan
>>>
>>> On Thu, May 14, 2020 at 5:15 PM David Feuer <david.feuer at gmail.com>
>>> wrote:
>>>
>>>> That doesn't look like it works on infinite lists.
>>>>
>>>> On Thu, May 14, 2020, 8:09 PM Ryan Reich <ryan.reich at gmail.com> wrote:
>>>>
>>>>> Hopefully I'm not fooling myself...how about this?
>>>>>
>>>>> ---
>>>>> transpose :: [[a]] -> [[a]]
>>>>> transpose [] = []
>>>>> transpose (r1 : rrest) = zipWith' (:) (:[]) id r1 (transpose rrest)
>>>>>
>>>>> zipWith' :: (a -> b -> c) -> (a -> c) -> (b -> c) -> ([a] -> [b] ->
>>>>> [c])
>>>>> zipWith' _ _ fb [] bs = fb <$> bs
>>>>> zipWith' _ fa _ as [] = fa <$> as
>>>>> zipWith' f fa fb (a : as) (b : bs) = f a b : zipWith' f fa fb as bs
>>>>>
>>>>> main = do
>>>>> mapM_ (print . transpose)
>>>>> [
>>>>> [[1,2,3]],
>>>>> [[1,2,3],[4,5,6]],
>>>>> [[1,2,3],[4,5,6],[7,8,9]],
>>>>> [[10,11],[20],[],[30,31,32]]
>>>>> ]
>>>>> ---
>>>>>
>>>>> I see the output:
>>>>> ---
>>>>> [[1],[2],[3]]
>>>>> [[1,4],[2,5],[3,6]]
>>>>> [[1,4,7],[2,5,8],[3,6,9]]
>>>>> [[10,20,30],[11,31],[32]]
>>>>> ---
>>>>>
>>>>> which is correct (the last example is the one from the haddocks).
>>>>>
>>>>> I was concerned that the definition of zipWith' (which is akin to the
>>>>> Map function unionWith) is not compatible with the build/fold fusion rule,
>>>>> but the implementation of zipWith itself is basically the same.
>>>>>
>>>>> Ryan
>>>>>
>>>>> On Thu, May 14, 2020 at 1:17 PM David Feuer <david.feuer at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Right. I still think it might be the right thing to do, though. I'm
>>>>>> not a big fan of general-purpose library functions that have any
>>>>>> unnecessary memory leak hazard. The biggest counterargument is that real
>>>>>> code is unlikely to run into that problem.
>>>>>>
>>>>>> On Thu, May 14, 2020, 3:35 PM Ryan Reich <ryan.reich at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> My suggestion was much less sophisticated even than that, and is
>>>>>>> basically what David answered with fusion. Also according to his answer,
>>>>>>> the original code of transpose lacks the laziness that unzip's actual
>>>>>>> implementation would provide.
>>>>>>>
>>>>>>> I think what that means is that his concern over allocating extra
>>>>>>> pairs is about the ones created internally by unzip when it builds the lazy
>>>>>>> heads-and-tails accessors.
>>>>>>>
>>>>>>> On Thu, May 14, 2020, 03:27 Andreas Abel <andreas.abel at ifi.lmu.de>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On 2020-05-13 22:27, Ryan Reich wrote:
>>>>>>>> > Why not just inline the definition of unzip and hand-optimize
>>>>>>>> away the
>>>>>>>> > pairs?
>>>>>>>>
>>>>>>>> Isn't this what the original code of transpose is doing?
>>>>>>>>
>>>>>>>> > On Tue, May 12, 2020, 10:24 David Feuer <david.feuer at gmail.com
>>>>>>>> > <mailto:david.feuer at gmail.com>> wrote:
>>>>>>>> >
>>>>>>>> > Also, the more eager list allocation can increase residency,
>>>>>>>> but I
>>>>>>>> > don't think it can cause a leak.
>>>>>>>> >
>>>>>>>> > On Tue, May 12, 2020, 9:48 AM David Feuer <
>>>>>>>> david.feuer at gmail.com
>>>>>>>> > <mailto:david.feuer at gmail.com>> wrote:
>>>>>>>> >
>>>>>>>> > The cost of allocating the extra pairs.
>>>>>>>> >
>>>>>>>> > On Tue, May 12, 2020, 5:11 AM Andreas Abel
>>>>>>>> > <andreas.abel at ifi.lmu.de <mailto:andreas.abel at ifi.lmu.de>>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > > I don't know how much that'll cost in practice.
>>>>>>>> >
>>>>>>>> > What costs are you worried about?
>>>>>>>> >
>>>>>>>> > On 2020-05-12 00:08, David Feuer wrote:
>>>>>>>> > > In Data.List, we define
>>>>>>>> > >
>>>>>>>> > > transpose :: [[a]] -> [[a]]
>>>>>>>> > > transpose [] = []
>>>>>>>> > > transpose ([] : xss) = transpose xss
>>>>>>>> > > transpose ((x : xs) : xss) = (x : [h | (h : _) <-
>>>>>>>> xss]) :
>>>>>>>> > transpose (xs
>>>>>>>> > > : [t | (_ : t) <- xss])
>>>>>>>> > >
>>>>>>>> > > The potential difficulty is that we essentially
>>>>>>>> mapMaybe
>>>>>>>> > over the xss
>>>>>>>> > > list twice in the third case. So we hang on to the
>>>>>>>> heads
>>>>>>>> > where we need
>>>>>>>> > > the tails and the tails where we need the heads.
>>>>>>>> We could
>>>>>>>> > fix that with
>>>>>>>> > > something like this:
>>>>>>>> > >
>>>>>>>> > > transpose ((x : xs) : xss) = (x : fronts) :
>>>>>>>> transpose (xs
>>>>>>>> > : rears)
>>>>>>>> > > where
>>>>>>>> > > (fronts, rears) = unzip [(h,t) | (h : t) <-
>>>>>>>> xss]
>>>>>>>> > >
>>>>>>>> > > I don't know how much that'll cost in practice.
>>>>>>>> > >
>>>>>>>> > > _______________________________________________
>>>>>>>> > > Libraries mailing list
>>>>>>>> > > Libraries at haskell.org <mailto:
>>>>>>>> Libraries at haskell.org>
>>>>>>>> > >
>>>>>>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>>>>>>> > >
>>>>>>>> >
>>>>>>>> > _______________________________________________
>>>>>>>> > Libraries mailing list
>>>>>>>> > Libraries at haskell.org <mailto:Libraries at haskell.org>
>>>>>>>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > _______________________________________________
>>>>>>>> > Libraries mailing list
>>>>>>>> > Libraries at haskell.org
>>>>>>>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>>>>>>> >
>>>>>>>>
>>>>>>> _______________________________________________
>> Libraries mailing list
>> Libraries at haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/libraries/attachments/20200515/faf2142c/attachment.html>
More information about the Libraries
mailing list