[Haskell-cafe] is this a bug ?
daniel.is.fischer at web.de
Sat Jul 17 11:10:37 EDT 2010
On Saturday 17 July 2010 05:39:00, gate03 at landcroft.co.uk wrote:
> On Sat 17/07/10 04:17 , Alexander Solla ajs at 2piix.com sent:
> > Why are you performing unsafe IO actions? They don't play nice
> > with laziness.
> OK, fair cop, but without the unsafe IO action, it still misbehaves.
Source-diving reveals: it's a bug.
Text.Regex.Posix.ByteString.Lazy is just a thin wrapper around the strict
variant, lazy ByteStrings are transformed into strict ones before the
functions of Text.Regex.Posix.ByteString are called.
To avoid copying twice, if the lazy ByteString does not end with a '\0', a
'\0' is snoc'ed to the end before transforming to a strict ByteString.
Thus the regexec of Text.Regex.Posix.ByteString takes slices of a longer
ByteString than it should and no measures are taken to chop the trailing
'\0' off again.
A related problem is that ByteStrings (and Strings) may legitimately
contain '\0's, but regex-posix (and probably [almost] all other regex
packages) treats them as CStrings, so the regex functions will stop
processing at the first '\0' (naturally, they call C) but on the Haskell
side, that may be only a small part of the string.
More information about the Haskell-Cafe