Proposal: Improve error messages for (!!) (include index and length of list)

Fri Oct 17 12:32:25 UTC 2014

Well, it still needs to remember the value of the initial index
somewhere, but in this case it puts it on the stack. This version is
indeed faster than the one that computes. This seems to cost 4-5 ns
for short lists. Interestingly, this version seems to sometimes be a
bit faster for longer lists, but that's probably not what we should
optimise for. Benchmarks below.

{-# LANGUAGE MagicHash #-}
module Main where

import Criterion.Main

import GHC.Exts (Int(..), (+#), (-#))

nth1 :: [a] -> Int -> a
nth1 ys n | n < 0 = error $ "Prelude.(!!): negative index " ++ show n
nth1 ys (I# i) = go ys i
 where
   go (x:_)  0#  = x
   go (_:xs) idx = go xs (idx -# 1#)
   go []     idx = error $ "Prelude.(!!): index " ++ show (I# i) ++
                           " too large for list of length " ++ show
(I# (i -# idx))

nth2 :: [a] -> Int -> a
nth2 ys n | n < 0 = error $ "Prelude.(!!): negative index " ++ show n
nth2 ys (I# i) = go ys i 0#
 where
   go (x:_)  0#  _len = x
   go (_:xs) idx len  = go xs (idx -# 1#) (len +# 1#)
   go []     idx len  = error $ "Prelude.(!!): index " ++ show (I#
(idx +# len)) ++
                                " too large for list of length " ++
show (I# len)

main = do
  let l1 = [1..5]
  let l2 = [1..500]
  defaultMain
    [ bgroup "nth"
      [ bench "3/old" $ whnf (l1 !!) 3
      , bench "3/new/let-no-escape" $ whnf (l1 `nth1`) 3
      , bench "3/new/more-compute" $ whnf (l1 `nth2`) 3
      , bench "300/old" $ whnf (l2 !!) 300
      , bench "300/new/let-no-escape" $ whnf (l2 `nth1`) 300
      , bench "300/new/more-compute" $ whnf (l2 `nth2`) 300
      ]
    ]

benchmarking nth/3/old
time                 44.30 ns   (43.94 ns .. 44.65 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 44.24 ns   (43.99 ns .. 44.58 ns)
std dev              1.001 ns   (837.9 ps .. 1.221 ns)
variance introduced by outliers: 34% (moderately inflated)

benchmarking nth/3/new/let-no-escape
time                 51.68 ns   (51.36 ns .. 52.01 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 51.70 ns   (51.38 ns .. 52.05 ns)
std dev              1.146 ns   (922.3 ps .. 1.495 ns)
variance introduced by outliers: 33% (moderately inflated)

benchmarking nth/3/new/more-compute
time                 53.44 ns   (53.12 ns .. 53.77 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 53.55 ns   (53.16 ns .. 54.12 ns)
std dev              1.591 ns   (1.089 ns .. 2.354 ns)
variance introduced by outliers: 47% (moderately inflated)

benchmarking nth/300/old
time                 747.4 ns   (740.7 ns .. 753.6 ns)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 742.9 ns   (737.1 ns .. 751.4 ns)
std dev              23.00 ns   (17.75 ns .. 33.61 ns)
variance introduced by outliers: 43% (moderately inflated)

benchmarking nth/300/new/let-no-escape
time                 742.7 ns   (736.7 ns .. 749.2 ns)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 745.1 ns   (738.8 ns .. 757.8 ns)
std dev              28.11 ns   (19.71 ns .. 42.69 ns)
variance introduced by outliers: 53% (severely inflated)

benchmarking nth/300/new/more-compute
time                 810.2 ns   (801.5 ns .. 818.3 ns)
                     0.999 R²   (0.999 R² .. 0.999 R²)
mean                 812.9 ns   (804.4 ns .. 824.5 ns)
std dev              33.12 ns   (25.91 ns .. 43.59 ns)
variance introduced by outliers: 57% (severely inflated)

On 17 October 2014 13:41, Twan van Laarhoven <twanvl at gmail.com> wrote:
> You don't need an extra parameter for calculating the length, because
>   len = n - idx
> So,
>
>     xs !! (I# n) = go xs n
>       where
>       go [] idx = error $ "... Index " ++ show (I# n)
>               ++ " too large for list of length" ++ show (I# (n -# idx))
>       go (x:_) 0# = x
>       go (_:xs) idx = go xs (idx +# 1)
>
> By the way, do we still need all the manual unboxing with a modern Ghc?
>
> Twan
>
>
> On 2014-10-16 14:13, Thomas Schilling wrote:
>>
>> Yes, you'd have to calculate the length on the fly. i.e., something
>> like this (untested):
>>
>> xs !! n | n < 0 = error "... negative index ..."
>> xs !! (I# n) = go xs n 0#
>>    where
>>      go [] idx len = error $ "... Index " ++ show (I# (idx +# len) ++ "
>> too large for list of length "
>>                                    ++ show (I# len)
>>      go (x:_) 0# _ = x
>>      go (_:xs) idx len = go xs (idx -# 0#) (len +# 1#)
>>
>> On modern processors the extra addition and the extra parameter
>> shouldn't hurt, though we'd need a benchmark to make sure, of course.
>> You could also make the error message a bit less helpful and just
>> return how far the index pointed past the end of the list.
>>
>>
>> On 16 October 2014 13:46, Herbert Valerio Riedel <hvr at gnu.org> wrote:
>>>
>>> On 2014-10-16 at 08:20:55 +0200, Simon Hengel wrote:
>>>
>>> [...]
>>>
>>>> I propose to change the error messages for the non-report version to
>>>> include index and list length, something that is functionally equivalent
>>>> to:
>>>
>>>
>>> While I'm very sympathetic to better error messages; doesn't the
>>> implementation you gave defer garbage-collecting the start of the list,
>>> by keeping the head of the list alive until either the desired index has
>>> been reached or end-of-list is detected?
>>>
>>> e.g. consider something (silly) like ([1..] !! 10000000)
>>>
>>> Cheers,
>>>    hvr
>>> _______________________________________________
>>> Libraries mailing list
>>> Libraries at haskell.org
>>> http://www.haskell.org/mailman/listinfo/libraries
>>
>> _______________________________________________
>> Libraries mailing list
>> Libraries at haskell.org
>> http://www.haskell.org/mailman/listinfo/libraries
>>
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://www.haskell.org/mailman/listinfo/libraries