[Haskell-cafe] [ANNOUNCE] (and request for review): directory-tree v0.9.0

Brandon Simmons brandon.m.simmons at gmail.com
Tue Aug 10 20:54:20 EDT 2010


On Tue, Aug 10, 2010 at 4:34 PM, Jason Dagit <dagit at codersbase.com> wrote:
>
>
> On Mon, Aug 9, 2010 at 10:48 PM, Brandon Simmons
> <brandon.m.simmons at gmail.com> wrote:
>>
>> Greetings Haskellers!
>>
>> directory-tree is a module providing a directory-tree-like datatype
>> along with Foldable and Traversable instances, along with a simple,
>> high-level IO interface. You can see the package along with some
>> examples here (apologies if the haddock docs haven't been generated
>> yet) :
>>
>>    http://hackage.haskell.org/package/directory-tree
>
> If I understand what you're saying, then your library is very similar to an
> abstraction that darcs had for years knows as "Slurpy".  The experience in
> the darcs project was that it lead to performance issues and correctness
> issues that were hard to find/fix.
>>
>> This primary change in this release is the addition of two
>> experimental "lazy" functions: `readDirectoryWithL` and `buildL`.
>> These functions use `unsafePerformIO` behind the scenes to traverse
>> the filesystem as required by pure computations consuming the returned
>> DirTree data structure. I believe I am doing this safely and sanely
>> but would love if some more experienced folks could comment on the
>> code.
>
> unsafePerformIO or unsafeInterleaveIO?
> Either way, to me it seems a bit dangerous to be doing this sort of lazy IO.
>  If the directory structure is large will I run out of file handles?  How
> will IO errors be handled?  Will I receive the exceptions in pure code or
> inside my IO actions?  Will I run into space leaks if something holds on to
> 1 file and then references it "after" the directory traversal?  I might have
> my history wrong, but as I recall darcs started with lazy slurpies and moved
> to doing things strictly due to space leaks, running out of file
> descriptors, file descriptor leaks (not running out, but having the file be
> locked long after darcs should have been 'done' with it), and exception
> delivery.

IO Errors are caught in a pure constructor called "Failed". In
practice I think my unsafe version is better in many of those respects
than the original, for example with regard to running out of file
handles. Are you referring to lazy IO in general, which those problems
you mention seem to apply to, or the use of unsafePerformIO?

I certainly want this module to be as useful and problem-free as
possible, but I will be content if it is no less problematic than lazy
IO is problematic.

Could you elaborate on

    > "Will I run into space leaks if something holds on to1 file and
then references
    > it "after" the directory traversal"?

?

> It's a seductive path but one that does not seem to have a good ending.
> I'm not sure what darcs uses these days.  Perhaps that's what hashed-storage
> provides, although I haven't been able to find any documentation on
> hashed-storage other than the haddocks (which only document the api with no
> overview or explanation of the problem hashed-storage solves).
> Jason

Eric Kow just pointed out the existence of hashed-storage to me (I
believe you are right that it is what darcs does/will use) and it will
be interesting to see the approach in there, if I can grok it.

Thanks a lot for the input.

Brandon Simmons
http://coder.bsimmons.name


More information about the Haskell-Cafe mailing list