[Haskell-cafe] ANNOUNCE: posix-paths, for faster file system operations
Niklas Hambüchen
mail at nh2.me
Wed Aug 21 04:29:35 CEST 2013
John Lato and I would like to announce our posix-paths package.
https://github.com/JohnLato/posix-paths
It implements a large portion of System.Posix.FilePath using ByteString
based RawFilePaths instead of String based FilePaths, and on top of that
provides a Traversal module with a fast replacement for
`getDirectoryContents` and a recursive `allDirectoryContents`.
`getDirectoryContents` is (unsurprisingly?) really slow.
Our replacement is 11 times faster in the recursive use case [1], and
only 20% slower than `find`.
Benchmarks are at [2], code is at [3].
We hope that these improvements will eventually make it into base some day.
Until then, we propose our package as a base for discussion and further
improvements.
Contributions are welcome:
Some FilePath operations are not in it yet (especially the Windows /
drive related ones), and our traversals might not work on Windows.
We would also appreciate some thorough looks at their low level
implementations.
If you find our benchmarks against getDirectoryContents unfair or would
like to add another one, please send a pull request.
We have been running this on Linux production machines for a few months
now, and are pleased by the speed-up.
[1] For the recursive version of the original `getDirectoryContents`, we
used the implementation given in Real World Haskell:
http://book.realworldhaskell.org/read/io-case-study-a-library-for-searching-the-filesystem.html
[2] Benchmarks:
On a real file system: http://johnlato.github.io/posix-paths/usrLocal.html
On tmpfs: http://johnlato.github.io/posix-paths/tmpfs.html (note that
here find is slow because of process starting overhead)
[3] Code:
Github: https://github.com/JohnLato/posix-paths
RawFilePath operations:
https://github.com/JohnLato/posix-paths/blob/master/src/System/Posix/FilePath.hs
Traversals:
https://github.com/JohnLato/posix-paths/blob/master/src/System/Posix/Directory/Traversals.hs
Benchmarks:
https://github.com/JohnLato/posix-paths/blob/master/benchmarks/Bench.hs
More information about the Haskell-Cafe
mailing list