HEAD: Deterioration in ByteString I/O
Simon Marlow
marlowsd at gmail.com
Thu Sep 9 07:19:23 EDT 2010
On 09/09/2010 00:28, Daniel Fischer wrote:
> On Wednesday 08 September 2010 23:55:35, Don Stewart wrote:
>> simonpj:
>>> |> ghc-6.12.3:
>>> |> 89,330,672 bytes allocated in the heap
>>> |> 15,092 bytes copied during GC
>>> |> 35,980 bytes maximum residency (1 sample(s))
>>> |> 29,556 bytes maximum slop
>>> |> 2 MB total memory in use (0 MB lost due to
>>> |> fragmentation)
>>> |>
>>> |> ghc-6.13.20100831:
>>> |> 475,305,720 bytes allocated in the heap
>>> |> 89,272 bytes copied during GC
>>> |> 68,860 bytes maximum residency (1 sample(s))
>>> |> 29,444 bytes maximum slop
>>> |> 2 MB total memory in use (0 MB lost due to
>>> |> fragmentation)
>>> |
>>> | Can you put your benchmark code somewhere? Likely a GHC
>>> | regression.
>>>
>>> Indeed bad. If someone could characterise the regression more
>>> precisely (e.g. "fusion isn't happening here") that would be jolly
>>> helpful.
>>
>> Shouldn't be fusion. Is this a straight IO function. Something to do
>> with buffering/encoding?
>
> Maybe the following observation helps:
>
> ghc-6.13.20100831 reads lazy ByteStrings in chunks of 8192 bytes.
>
> If I understand correctly, that means (since defaultChunkSize = 32760)
> - bytestring allocates a 32K buffer to be filled and asks ghc for 32760
> bytes in that buffer
> - ghc asks the OS for 8192 bytes (and usually gets them)
> - upon receiving fewer bytes than requested, bytestring copies them to a
> new smaller buffer
> - since the number of bytes received is a multiple of ghc's allocation
> block size (which I believe is 4K), there's no space for the bookkeeping
> overhead, hence the new buffer takes up 12K instead of 8, resulting in 44K
> allocation for 8K bytes
>
> That factor of 5.5 corresponds pretty well with the allocation figures
> above, and the extra copying explains the approximate doubling of I/O time.
>
> Trying to find out why ghc asks the OS for only 8192 bytes instead of 32760
> hasn't brought enlightenment yet.
I think I've found the problem, GHC.IO.Handle.Text:
bufReadNBEmpty :: Handle__ -> Buffer Word8 -> Ptr Word8 -> Int -> Int ->
IO Int
bufReadNBEmpty h_ at Handle__{..}
buf at Buffer{ bufRaw=raw, bufR=w, bufL=r, bufSize=sz }
ptr so_far count
| count > sz, False,
Just fd <- cast haDevice = do
m <- RawIO.readNonBlocking (fd::FD) ptr count
case m of
Nothing -> return so_far
Just n -> return (so_far + n)
See if you can spot it. I must have made this change for debugging, and
forgot to revert it. d'oh!
Cheers,
Simon
>
> Cheers,
> Daniel
>
> Excerpt of strace log:
>
> read(3, "%!PS-Adobe-2.0\n%%Title: nbench\n%"..., 8192) = 8192
> open("/usr/lib/gconv/UTF-32.so", O_RDONLY) = 4
> read(4,
> "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0`\4\0\0004\0\0\0"..., 512)
> = 512
> fstat64(4, {st_mode=S_IFREG|0755, st_size=9672, ...}) = 0
> mmap2(NULL, 12328, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) =
> 0xb7852000
> fadvise64(4, 0, 12328, POSIX_FADV_WILLNEED) = 0
> mmap2(0xb7854000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|
> MAP_DENYWRITE, 4, 0x1) = 0xb7854000
> close(4) = 0
> mprotect(0xb7854000, 4096, PROT_READ) = 0
> ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbff37cb0) = -1 ENOTTY
> (Inappropriate ioctl for device)
> select(2, [], [1], NULL, {0, 0}) = 1 (out [1], left {0, 0})
> write(1, "", 0) = 0
> select(2, [], [1], NULL, {0, 0}) = 1 (out [1], left {0, 0})
> write(1, "%!PS-Adobe-2.0\n%%Title: nbench\n%"..., 8192) = 8192
> read(3, " 20.000000 lineto\n121.153524 20."..., 8192) = 8192
> select(2, [], [1], NULL, {0, 0}) = 1 (out [1], left {0, 0})
> write(1, "", 0) = 0
> select(2, [], [1], NULL, {0, 0}) = 1 (out [1], left {0, 0})
> write(1, " 20.000000 lineto\n121.153524 20."..., 8192) = 8192
> read(3, "30.542315 21.394403 lineto\n125.3"..., 8192) = 8192
> select(2, [], [1], NULL, {0, 0}) = 1 (out [1], left {0, 0})
> write(1, "", 0) = 0
> select(2, [], [1], NULL, {0, 0}) = 1 (out [1], left {0, 0})
> write(1, "30.542315 21.394403 lineto\n125.3"..., 8192) = 8192
> read(3, "neto\n308.929337 21.969871 lineto"..., 8192) = 8192
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users at haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
More information about the Glasgow-haskell-users
mailing list