[Haskell-beginners] Processing a list of files the Haskell way
Michael Schober
Micha-Schober at web.de
Thu Mar 15 12:14:59 CET 2012
On 03/14/2012 01:04 PM, Chaddaï Fouché wrote:
> In my own tests it wasn't very fast : 10min to check a 25GB hierarchy of
> music files, using 13MB of memory maximum. Though I must admit that I
> didn't try to find similar tools to compare so I'm not too certain of
> normal performance times.
I finally got it running after a Haskell-platform update from source to
resolve some dependencies, but now it works like a charm. Runtime isn't
an issue for me so far and memory consumption seems to be in acceptable
levels, so it's a very good start - thanks again.
I took the liberty to modify the output a little bit to my needs - maybe
a future reader will find it helpful, too. It's attached below.
Best,
Michael
--
module Main where
import Data.Conduit.Filesystem (traverse)
import qualified Data.Conduit.List as CL
import Data.Conduit
import Data.Digest.Pure.MD5 (MD5Digest)
import Crypto.Conduit (hashFile)
import qualified Data.Map as M
import qualified Filesystem.Path.CurrentOS as FP
import System.Environment
duplicates :: FilePath -> IO [(MD5Digest,[FilePath])]
duplicates dir = runResourceT $ do
md5s <- traverse False (FP.decodeString dir) $$ CL.mapM process =$
CL.fold buildMap M.empty
return . M.assocs . M.filter ((>1).length) $ md5s
where
process :: FP.FilePath -> IO (MD5Digest, FilePath)
process fp = do
let strFp = FP.encodeString fp
md5 <- hashFile strFp
return (md5,strFp)
buildMap m (md5,fp) = M.insertWith' (flip (++)) md5 [fp] m
main = do
[dir] <- getArgs
putStrLn . unlines . map (\(md5,paths) -> (show md5) ++ "-->\n" ++
(unlines paths)) =<< duplicates dir
More information about the Beginners
mailing list