Proposal #2659: Add sortOn and friends to Data.List

Twan van Laarhoven twanvl at gmail.com
Mon Oct 6 16:50:29 EDT 2008


Mitchell, Neil wrote:
> Hi Twan,
> 
> 
>>Almost all uses of sortBy in user code use 'comparing', 'on' 
>>or a similar construction [1]. I think we should add a 
>>function that makes this common behavior more convenient:
>>
>>     sortOn :: Ord b => (a -> b) -> [a] -> [a]
>>
>>For consistency we should also add *On for the other *By 
>>functions in Data.List:
>>
>>     nubOn           :: Eq b => (a -> b) -> [a] -> [a]
>>     deleteOn        :: Eq b => (a -> b) -> a -> [a] -> [a]
>>     deleteFirstsOn  :: Eq b => (a -> b) -> [a] -> [a] -> [a]
>>     unionOn         :: Eq b => (a -> b) -> [a] -> [a] -> [a]
>>     intersectOn     :: Eq b => (a -> b) -> [a] -> [a] -> [a]
>>     groupOn         :: Eq b => (a -> b) -> [a] -> [[a]]
>>     sortOn          :: Ord b => (a -> b) -> [a] -> [a]
>>     insertOn        :: Ord b => (a -> b) -> a -> [a] -> [a]
>>     maximumOn       :: Ord b => (a -> b) -> [a] -> a
>>     minimumOn       :: Ord b => (a -> b) -> [a] -> a
>>    (nubSortOn       :: Ord b => (a -> b) -> [a] -> [a]) -- see #2629
> 
> 
> All good, apart from deleteFirstsOn. I honestly don't know what that
> function does, I was thinking possibly something like the first
> component of tuples? But no real idea.

deleteFirstBy is "\\By", so it should perhaps be called differenceBy. But this 
is just the name that the List module uses. The reason for adding deleteFirstOn 
is simply consistency, since we also have a By version.

> In Hoogle I use sortOn to be the non-cacheing version, and sortWith to
> be the cacheing version. However, I think we can just ignore the
> non-cacheing version - if your extraction function is expensive you
> probably want to think a bit anyway.

If the extraction function is slow, like say "sortOn length", caching is a clear 
win. On the other hand, there is some constant cost involved.

Here are some benchmarks of sorting words in a dictionary:

                 no cache     cache
   sortOn f      3.203125     0.625000
   sortOn id     0.156250     0.187500

where f = length . nub . sort, i.e. an expensive function.

I'm not sure if the factor 1.2 overhead of the caching version is important 
enough to warrant exposing two different implementations.


Twan


More information about the Libraries mailing list