[Haskell-cafe] haskell and releasing resources

Fawzi Mohamed fmohamed at mac.com
Tue Feb 6 08:38:04 EST 2007


I am just coming to haskell, and I wrote a simple command to get some  
input from a pdf file

I just wanted the output of the command so I did something like

import System.Process (runInteractiveCommand)
import IO (hGetContents)

-- | returns the text of the first page of the pdf at the given path,  
needs pdftotext
getTextOfPdf :: String -> IO String
getTextOfPdf pdfPath = do
     (inp,out,err,pid) <- runInteractiveCommand ("pdftotext -l 1 "+ 
+pdfPath++" -")
     return (hGetContents out)

I don't care about error handling, if something goes wrong it is ok  
to hang or crash, but knowing unix I wondered if this would do the  
right thing or if it would create a zombi process.

I was about to ask, but then I thought "let's test it", and sure  
enough the zombi stays there.
I tried to even to allocate more than one, wait, I even managed to  
exhaust the resources of my machine...

So here is what I would have liked to happen: when the pid gets  
garbage collected it tries to wait for the process, and if that fails  
the pid stays around longer and will try to wait later.

Too difficult? I don't know, but it is what I had expected from  
haskell. Failing that I would have expected clear hints that one  
should wait for external processes in the documentation, and I found  
none.

So what is the way out? I could do a forkIO and wait for the process  
but then I wonder do I have to wait for the thread, or dead thread  
become zombi?
I that is the case then the only way out would be to give back also  
the pid and make the waiting responsibility of the caller, not very  
nice, but probably the real solution.
I have seen that in missingh there is a pipeout example but it seems  
to me that you are still responsible to wait for the process (with  
ensureSuccess).

Maybe the correct thing is not being able to ignore the return code  
of the process.... but now I am becoming suspicious of other things,  
for example a file handle, you need to close it or you can expect  
that it will be closed when the handle is garbage collected?
Are there other places where you need to pay attention to be sure  
that you are releasing the resources you acquired?
I suppose problems can come only when an external resource is  
involved, or not

So

1) documentation should specify if one should do some specific action  
to free resources, can someone fix this?

2) is there a clean (haskell;) way to deal with this?

3) other places apart external processes where you have to pay  
attention?

What is your wisdom...
thanks

Fawzi


More information about the Haskell-Cafe mailing list