Proposal for containers: Add 'lookup' function to Data.Set

Ben Millwood haskell at benmachine.co.uk
Thu Jul 14 15:33:35 UTC 2016


I agree with Henning, and moreover with wren's point that Data.Set is an 
often-unsatisfactory data structure for interning in general. Map can do 
the job more obviously and more comprehensively, at the cost of a few 
more pointers -- and when has Haskell ever been about trading clarity 
for fewer pointers?

I don't think I'll formally vote, since I haven't written any Haskell in 
months, but I think the addition of this at-first-glance 
seemingly-useless function adds surprise and a little bit of 
headscratching to the user interface as a whole, to the extent that I 
feel like it has a "weirdness cost" even if most users can just ignore 
it. It also complicates (in my mind) the semantic story of Data.Set, 
from something purely about a collection of elements and a membership 
test to something which has specific opinions on which of two notionally 
"equal" things are most appropriate in a given situation. I think the 
cost of turning Data.Set from a simple thing into a complicated thing is 
a subtle one that should not totally be overlooked -- API simplicity is 
a virtue in itself.

Add to that Henning and wren's evidence, which I interpret as saying 
that this function isn't quite the *perfect* tool for seemingly *any* 
job, and it doesn't seem worth the cognitive / maintenance cost to me.

(I perhaps do not expect these considerations to be very strong or very 
important, but I think it would be unwise to ignore them altogether.)

On Tue, Jun 28, 2016 at 11:19:16AM +0200, Henning Thielemann wrote:
>
>On Mon, 27 Jun 2016, Nicolas Godbout wrote:
>
>>Example use case: In a parser, the memory footprint can be reduced by collapsing
>>all equal strings to a single instance of each string. To achieve this, one needs
>>a way to get a previously seen string (internally, a pointer) equal to a newly
>>parsed string. Amazingly, this is very difficult with the current "containers" library interface.
>>One current option is to use a Map instead, e.g., 'Map String String'
>>which stores twice as many pointers as necessary.
>
>I think I'd prefer the (Map String String) variant to an obscure Set 
>lookup function. I wonder whether you will later use the Map anyway, 
>as the compiler grows and you need to attach more data to tokens. In 
>order to make your intent clearer you might define
>
>  newtype SharedToken = SharedToken String
>
>and use (Map String SharedToken).
>_______________________________________________
>Libraries mailing list
>Libraries at haskell.org
>http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries


More information about the Libraries mailing list