Proposal #2659: Add sortOn and friends to Data.List
Twan van Laarhoven
twanvl at gmail.com
Mon Oct 6 16:50:29 EDT 2008
Mitchell, Neil wrote:
> Hi Twan,
>
>
>>Almost all uses of sortBy in user code use 'comparing', 'on'
>>or a similar construction [1]. I think we should add a
>>function that makes this common behavior more convenient:
>>
>> sortOn :: Ord b => (a -> b) -> [a] -> [a]
>>
>>For consistency we should also add *On for the other *By
>>functions in Data.List:
>>
>> nubOn :: Eq b => (a -> b) -> [a] -> [a]
>> deleteOn :: Eq b => (a -> b) -> a -> [a] -> [a]
>> deleteFirstsOn :: Eq b => (a -> b) -> [a] -> [a] -> [a]
>> unionOn :: Eq b => (a -> b) -> [a] -> [a] -> [a]
>> intersectOn :: Eq b => (a -> b) -> [a] -> [a] -> [a]
>> groupOn :: Eq b => (a -> b) -> [a] -> [[a]]
>> sortOn :: Ord b => (a -> b) -> [a] -> [a]
>> insertOn :: Ord b => (a -> b) -> a -> [a] -> [a]
>> maximumOn :: Ord b => (a -> b) -> [a] -> a
>> minimumOn :: Ord b => (a -> b) -> [a] -> a
>> (nubSortOn :: Ord b => (a -> b) -> [a] -> [a]) -- see #2629
>
>
> All good, apart from deleteFirstsOn. I honestly don't know what that
> function does, I was thinking possibly something like the first
> component of tuples? But no real idea.
deleteFirstBy is "\\By", so it should perhaps be called differenceBy. But this
is just the name that the List module uses. The reason for adding deleteFirstOn
is simply consistency, since we also have a By version.
> In Hoogle I use sortOn to be the non-cacheing version, and sortWith to
> be the cacheing version. However, I think we can just ignore the
> non-cacheing version - if your extraction function is expensive you
> probably want to think a bit anyway.
If the extraction function is slow, like say "sortOn length", caching is a clear
win. On the other hand, there is some constant cost involved.
Here are some benchmarks of sorting words in a dictionary:
no cache cache
sortOn f 3.203125 0.625000
sortOn id 0.156250 0.187500
where f = length . nub . sort, i.e. an expensive function.
I'm not sure if the factor 1.2 overhead of the caching version is important
enough to warrant exposing two different implementations.
Twan
More information about the Libraries
mailing list