Add NonEmptyMap and NonEmptySet to containers

David Feuer david.feuer at gmail.com
Thu Apr 25 18:04:33 UTC 2019


Oh, I see what you mean. But then containers would depend on GADTs, which
is rather counter to tradition. And I don't really see the benefit over a
newtype and smart constructors around the existing Seq type.

On Thu, Apr 25, 2019, 1:41 PM Zemyla <zemyla at gmail.com> wrote:

> We could have a GADT for the FingerTree:
>
> data Emptiness = E | NE
>
> data FingerTree e a where
>   EmptyT :: FingerTree 'E a
>   SingleT :: a -> FingerTree e a
>   DeepT :: !(Digit a) -> FingerTree e' (Node a) -> !(Digit a) ->
> FingerTree e a
>
> And because Seq is a newtype wrapper around FingerTree (Elem a) anyway,
> Seq and NonEmptySeq would become newtype wrappers around FingerTree 'E
> (Elem a) and FingerTree 'NE (Elem a). This avoids the majority of the code
> duplication.
>
> On Thu, Apr 25, 2019, 12:34 John Ericson <john.ericson at obsidian.systems>
> wrote:
>
>> Thanks for the clarification, David. Sounds like a good wait-and-see
>> thing: after this lands, Zemyla's proposal and the experience with
>> NonEmpty{Map,Set} can inform what Seq should end up looking like.
>>
>> John
>> On 4/25/19 1:28 PM, David Feuer wrote:
>>
>> No, a Seq can be Empty at the bottom too. It's definitely not a mutually
>> recursive situation like Set or Map, and it's not immediately a top-level
>> one like IntSet or IntMap. A nonempty sequence type would need somewhat
>> different functions for everything. In most cases, the changes seem fairly
>> straightforward. I'm not sure about weird functions like inits and tails.
>> Some extra care might be required for replicate.
>>
>> I believe it would be possible to restructure things to make the
>> distinction a top-level one, by using a possibly-empty type for the top and
>> a nonempty one below. Zemyla has already identified some very solid
>> unrelated reasons to want to separate the tops from the rest, but I'm
>> somewhat concerned about source code duplication with that general
>> approach. Even if we do that, it's not clear to me that we can make a
>> non-empty/possibly-empty distinction without incurring a performance
>> penalty.
>>
>> On Thu, Apr 25, 2019, 1:17 PM John Ericson
>> <john.ericson at obsidian.systems> <john.ericson at obsidian.systems> wrote:
>>
>>> I haven't looked into `Seq` in addition to `Map` and `Set`, just
>>> `IntSet` and `IntMap`. But it might be a similar thing? I take it that
>>> with `Seq` today only the root can be empty and everything else is
>>> single or deep? That means we *would* just use a single Maybe-like thing
>>> at the top level, no mutual recursion. But on the other hand to make
>>> that to make that work efficiently we would would need GHC to support
>>> unboxing sums, so 1 + 2 variants can become a flat 3.
>>>
>>> Also, https://github.com/haskell/containers/pull/616 is now where the
>>> actual implementation is happening, not just the datatype changes as
>>> before. Feel free to comment on the concrete work in progress, everyone!
>>>
>>> John
>>>
>>> On 4/25/19 11:36 AM, Zemyla wrote:
>>> > A Seq has either Empty, Single, or Deep. A NonEmptySeq would have just
>>> > Single or Deep.
>>> >
>>> > On Thu, Apr 25, 2019, 09:55 David Feuer <david.feuer at gmail.com
>>> > <mailto:david.feuer at gmail.com>> wrote:
>>> >
>>> >     I don't see the benefit there, unless you see a way to work it
>>> >     into the representation.
>>> >
>>> >     On Thu, Apr 25, 2019, 10:53 AM Zemyla <zemyla at gmail.com
>>> >     <mailto:zemyla at gmail.com>> wrote:
>>> >
>>> >         As long as we're doing this, can we also add NonEmptySeq as
>>> well?
>>> >
>>> >         On Thu, Apr 25, 2019, 09:11 Artyom Kazak <yom at artyom.me
>>> >         <mailto:yom at artyom.me>> wrote:
>>> >
>>> >             I'm -1 on any kind of |Map = NEMap|.
>>> >
>>> >             An ordinary map and a non-empty map are semantically
>>> >             different. I believe that if I non-empty maps were already
>>> >             in |containers|, I would pretty much always care whether a
>>> >             |Map| I see in code is a 0-map or 1-map.
>>> >
>>> >             Similarly, I prefer |Int| and |Word| instead of |Int| and
>>> >             |Unsigned.Int|. (Luckily that's already the case.)
>>> >
>>> >             We already have a precedent with |Text| and |ByteString|,
>>> >             where the lazy and the strict versions are only
>>> >             distinguished by the module prefix. In my experience,
>>> >             modules where both are used are pretty common, and I end
>>> >             up just introducing |type LByteString = Lazy.ByteString|
>>> >             in all my projects, because otherwise I need to scroll to
>>> >             the imports section whenever I need to know which flavor
>>> >             of bytestring is being used. (Or if I'm reading haddocks,
>>> >             I have to look at the link because Haddock hides module
>>> >             prefixes.)
>>> >
>>> >             "why not both" is even worse. I still can't trust the
>>> >             |Map|, but now I also have to learn and remember that two
>>> >             modules are the same. Speaking from experience again –
>>> >             most people seem to be surprised by the fact that
>>> >             |Data.Map.Lazy| and |Data.Map.Strict| export the same
>>> >             |Map| type. The proposed module hierarchy would move
>>> >             |containers| to the top of my "packages that confuse
>>> >             beginners" list, beating even |aeson|.
>>> >
>>> >             As an aside, I wish we had a proper interface for
>>> >             container-like structures, or at least a solution to name
>>> >             scoping. I really like the way Rust does it, for instance,
>>> >             where certain functions can be "attached" to a type – I'm
>>> >             hesitant to call them "methods" because Rust is not an OOP
>>> >             language.
>>> >             On Apr 25 2019, at 2:49 pm, Mario Blažević
>>> >             <mblazevic at stilo.com <mailto:mblazevic at stilo.com>> wrote:
>>> >
>>> >                 On 2019-04-18 11:00 p.m., David Feuer wrote:
>>> >
>>> >                     I'm in favor of the proposal. I find the
>>> >                     isomorphism between Map (a,b) v
>>> >                     and Map a (NonemptyMap b v) very pleasant. The
>>> >                     fact that others have
>>> >                     written less-performant implementations of this
>>> >                     idea is rather
>>> >                     convincing. The fact that doing this removes
>>> >                     partial matches in the
>>> >                     implementation is nice. And I'll take performance
>>> >                     improvements where I
>>> >                     can get them. The main question is the proper name
>>> >                     of the type. Just
>>> >                     Data.Map.Nonempty.Map, or .NonemptyMap? Should the
>>> >                     empty be capitalized?
>>> >
>>> >
>>> >                 There seems to be a consensus for
>>> >                 Data.Map.NonEmpty.NEMap, with the
>>> >                 type and the functions slightly off the regular ones.
>>> >                 This design would
>>> >                 make it easier to use regular and non-empty containers
>>> >                 together, but it
>>> >                 be annoying for the use case of replacing all uses of
>>> >                 an existing
>>> >                 regular container with a non-empty one. I'd rather
>>> >                 change just the
>>> >                 import declaration than all occurrences of the type
>>> >                 name and functions.
>>> >
>>> >                 I don't want to derail the implementation with
>>> >                 bikeshedding, so I'm
>>> >                 just going to ask why not both? The library can both
>>> >                 export the tweaked
>>> >                 names and add a module, say Data.NonEmpty.Map.Lazy,
>>> >                 that exports the
>>> >                 type synonym Map = NEMap. It would also rename all the
>>> >                 functions back to
>>> >                 their names from Data.Map.Lazy.
>>> >
>>> >
>>> >
>>> >                     On Thu, Apr 18, 2019, 7:15 PM John Cotton Ericson
>>> >                     <John.Ericson at obsidian.systems>
>>> <John.Ericson at obsidian.systems> wrote:
>>> >
>>> >                     In
>>> >                     https://github.com/haskell/containers/issues/608 I
>>> >                     proposed
>>> >                     adding non-empty variants of Map and Set,
>>> analogous to
>>> >                     Data.List.NonEmpty for List, to containers.
>>> >                     semigroupoids
>>> >                     demonstrates the many uses and structure of
>>> >                     non-empty containers in
>>> >                     general, and libraries such as
>>> >                     https://github.com/mstksg/nonempty-containers and
>>> >                     https://github.com/andrewthad/non-empty-containers
>>> >                     demonstrate the
>>> >                     interest in non-empty maps and sets in particular.
>>> >                     My favorite
>>> >                     use-case is that they're needed to "curry"
>>> >                     containers: for example,
>>> >                     |Map (k0, k1) v| is isomorphic not to |Map k0 (Map
>>> >                     k1 v)| but to
>>> >                     |Map k0 (NonEmptyMap k1 v)|. I like this use-case
>>> >                     because it comes
>>> >                     from the containers themselves.
>>> >
>>> >                     Importantly, there's no good way to do this
>>> >                     outside of containers;
>>> >                     doing so leads to imbalancing / extra indirection,
>>> >                     or massive code
>>> >                     duplication. If one wraps the container was an
>>> >                     extra value like
>>> >                     Data.List.NonEmpty, one's left with an unavoidable
>>> >                     extra
>>> >                     indirection/imbalance. One can rectify this by
>>> >                     copying and modifying
>>> >                     the implementation of containers, but that's
>>> >                     hardly maintainable;
>>> >                     even as though the algorithms are the same, enough
>>> >                     lines are touched
>>> >                     that merging upstream containers is nigh
>>> impossible.
>>> >
>>> >                     On the other hand, the non-empty containers can be
>>> >                     elegantly and
>>> >                     sufficiently implemented alongside their originals
>>> >                     by taking the Bin
>>> >                     constructor and breaking it out into it's own
>>> >                     type, mutually
>>> >                     recursive with the original. This avoids the
>>> >                     indirection/imbalancing
>>> >                     and code duplication problems: the algorithms work
>>> >                     exactly as before
>>> >                     creating the same trees (remember the UNPACK), and
>>> >                     no code
>>> >                     duplicated since the functions become mutually
>>> >                     recursive matching
>>> >                     the types.
>>> >
>>> >                     To briefly summarize the thread:
>>> >
>>> >                     1. I proposed the issue after performing this same
>>> >                     refactor on the
>>> >                     dependent-map package:
>>> >
>>> https://github.com/obsidiansystems/dependent-map/tree/non-empty,
>>> >                     a fork of containers.
>>> >                     2. I made
>>> >                     https://github.com/haskell/containers/pull/616
>>> >                     which just
>>> >                     changes the types, to make sure UNPACK preserved
>>> >                     the importance.
>>> >                     3.
>>> >
>>> https://gist.github.com/Ericson2314/58709d0d99e0c0e83ad266701cd71841
>>> >                     the benchmarks showed rather than degrading
>>> >                     performance, PR 616
>>> >                     actually /improved/ it.
>>> >
>>> >                      If there is preliminary consensus, I'll make a
>>> >                     second PR on top
>>> >                     which generalizes the functions like on my
>>> >                     dependent-map branch.
>>> >
>>> >                     Thanks,
>>> >
>>> >                     John
>>> >
>>> >                     _______________________________________________
>>> >                     Libraries mailing list
>>> >                     Libraries at haskell.org
>>> >                     <mailto:Libraries at haskell.org>
>>> >                     <mailto:Libraries at haskell.org
>>> >                     <mailto:Libraries at haskell.org>>
>>> >
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>> >
>>> >
>>> >                     _______________________________________________
>>> >                     Libraries mailing list
>>> >                     Libraries at haskell.org <mailto:
>>> Libraries at haskell.org>
>>> >
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>> >
>>> >
>>> >                 _______________________________________________
>>> >                 Libraries mailing list
>>> >                 Libraries at haskell.org <mailto:Libraries at haskell.org>
>>> >
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>> >
>>> >             _______________________________________________
>>> >             Libraries mailing list
>>> >             Libraries at haskell.org <mailto:Libraries at haskell.org>
>>> >             http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>> >
>>> >         _______________________________________________
>>> >         Libraries mailing list
>>> >         Libraries at haskell.org <mailto:Libraries at haskell.org>
>>> >         http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>> >
>>> >
>>> > _______________________________________________
>>> > Libraries mailing list
>>> > Libraries at haskell.org
>>> > http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/libraries/attachments/20190425/29d25a2e/attachment.html>


More information about the Libraries mailing list